What is Data Science & Introduction
What is Data Science?
Data science is a deep study of the massive amount of data, which involves extracting meaningful insights from raw, structured, and unstructured data that is processed using the scientific method, different technologies, and algorithms.
Data science is a multidisciplinary field that uses tools and techniques to manipulate the data so that you can find something new and meaningful.
Data science uses the most powerful hardware, programming systems, and most efficient algorithms to solve the data related problems. It is the future of artificial intelligence.
In short, we can say that data science is all about:
- Asking the correct questions and analyzing the raw data.
- Modeling the data using various complex and efficient algorithms.
- Visualizing the data to get a better perspective.
- Understanding the data to make better decisions and finding the final result.
For Example:
Let suppose we want to travel from station A
to station B by car:
Now,
we need some important considerations to take effective decisions such as:
which
route will be the best,
best route to reach faster at the location,
in
which route there will be no traffic jam,
and
which will be cost-effective and easy to reach.
All
these decision factors will act as input data, and we will get an appropriate
answer from these decisions, so this analysis of data is called the data
analysis, which is a part of data science.
Data
science is a field that involves using statistical and computational techniques
to extract insights and knowledge from data.
It encompasses a wide range of tasks, some of those
including:
like
data cleaning and preparation,
data
visualization and statistical modeling,
machine
learning, and more .
Data
scientists use these techniques to discover patterns and trends in data, make
predictions, and support decision-making.
They
may work with a variety of data types, including structured data (such as
numbers and dates in a spreadsheet) and unstructured data (such as text,
images, or audio) .
Data
science is used in a wide range of industries, including finance, healthcare,
retail, and more.
It has become the most demanding job of the
21st century.
Every
organization is looking for candidates with knowledge of data science .
Introduction to Data Science
Data science is a field that involves using statistical and computational techniques to extract insights and knowledge from data.
It encompasses a wide range of tasks, including data cleaning and preparation, data visualization, statistical modeling, machine learning, and more.
Data scientists use these techniques to discover patterns and trends in data, make predictions, and support decision-making.
They may work with a variety of data types, including structured data (such as numbers and dates in a spreadsheet) and unstructured data (such as text, images, or audio).
Data science is used in a wide range of industries, including finance, healthcare, retail, and more. It has become the most demanding job of the 21st century. Every organization is looking for candidates with knowledge of data science.
There are many tools used in data science.
Some of the most frequently used tools for data science include
Apache Hadoop: a free, open-source framework that can manage and store large amounts of data.
SAS (Statistical Analysis System): a statistical tool developed by SAS Institute used by large organizations to analyze data
Apache Spark: is used for analyzing and working on large-scale data.
There are also general-purpose tools like MS Excel, which is a fundamental tool that helps in easy analysis and understanding of data.
About GITHUB Version Control
About Version Control
Version Control called as “version control” software repository.
About Repositories¶
When you start a new project, you should make a folder to contain just the stuff for that project.
When you want to back your work up on another computer, there are websites that specialize in git. The most popular is GitHub, acquired by Microsoft in 2018. In these notes, we’ll teach you how to use GitHub and assume that’s where you’re publishing your work.
If you want git to start tracking a folder and keeping snapshots, to enable the features listed above, you have to turn the folder into what is called a git repository, or for short, a repo.
By default, a folder on your computer is not tracked by git.
about Tracking changes in the repository¶
As you work on the project, inevitably you have ups and downs. May be it goes like this:
You start by downloading a dataset from the instructor and starting a new blank Python script or Jupyter notebook in your repo folder.
Everything’s fine so far. You try to load the dataset but keep getting errors.
A friend at dinner reminded you about setting the text encoding, and that fixed the problem.
You get the dataset loading before bed. You get the data cleaned without a problem.
During class, the instructor asks your team to make progress on a hypothesis test, but you run out of time in class before you can figure out all the details. The last few lines of code still give errors.
Sharing online:¶
The git term for a site on which you back up or publish a repository is called a remote. This is in contrast to the repo folder on your computer, which is called your local copy.
There are three important terms to know regarding dealing with remotes in git; I’ll phrase each of them in terms of using GitHub, but the same terms apply to any remote:
For repositories you created:
Sending my most recent commits to GitHub is called pushing my changes (that is, my commits).
For repositories someone else created:
Getting a copy of a repository is called cloning the repository. It’s not the same as downloading. A download contains just the latest version; a clone contains all past snapshots, too.
If the original author updates the repository with new content and I want to update my clone, that’s called pulling the changes (opposite of push, obviously).
Although technically it’s possible to pull and push to the same repository,
No comments:
Post a Comment