**High Level view on Data Science**

__What is Data Science?__

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. Data science is the same concept as data mining and big data: "use the most powerful hardware, the most powerful programming systems, and the most efficient algorithms to solve problems".

Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.[4] It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge. In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundational professional communities.

__Some Scope of Data Science__

- Automatic Self driving Car
- Eg: Tesla, Baidu, Waymo

- Delayed or Cancelled flights information
- Route Planning
- Predictive Analytics
- Promotional Offers
- Deciding which class of planes to purchase for better performance

- FedEx use Data Science models for operational efficiency
- Better Decision Making, Predictive Analysis, Pattern Discovery

To buy new furniture for office

- Which website to use
- Check Rating of website
- Check for Discount
- Check if furniture is appropriate or not

Taking Cab

- Choose route

Netflix

- What kind of shows people are interested in
- Easy to make Decision for appropriate advertisement

Politics

- Elections
- Influence the voters
- Personalized messages

**Process / Steps in Data Science**

- Asking the right questions and exploring the data
- Modeling the data using various algorithms (basically for Machine Learning)
- Finally communicating and visualizing the results

__ __

__Data Science vs Business Intelligence__

Criterion |
Business Intelligence |
Data Science |

Data Source |
Structured data e.g. Data Warehouse |
Unstructured data e.g. Web logs |

Method |
Analytical |
Scientific |

Skills |
Statistics, Visualization |
Statistics, Visualization, Machine Learning |

Focus |
Past and Present Data |
Present Data and Future Predictions |

__Prerequisites for Data Science__

- Curiosity
- Only when you ask questions, you will have a better understanding of the business problem

- Common Sense
- To identify new ways to solve a business problem and to detect priority problems

- Communication Skills
- Communicate their findings to business teams to act upon the insights

- Machine Learning
- Machine learning is the backbone of Data Science. It is one of the many ways that Data Science uses to find solution to a problem.

- Mathematics Modelling
- Mathematical Models can be extremely helpful to make fast calculations and predictions from what you know of you data

- Statistics
- It is core foundational to Data Science, to extract knowledge and obtain better results from the data

- Programming
- You should know at least one programming language, preferably Python or R for data modelling

- Databases
- The discipline of querying databases teaches you to ask better questions as a Data Scientist

__Tools / Skills used in Data Science__

**Data Analysis**

Skills: R, Python, Statistics

Tools: SAS, Jupyter, R studio, MATLAB, Excel, RapidMiner

**Data Warehousing**

Skills: ETL, SQL, Hadoop, Apache Spark

Tools: Informatica / Talend, AWS Redshift

**Data Visualization **

Skills: R, Python libraries

Tools: Jupyter, Tableau, Cognos, RAW

**Machine Learning**

Skills: Algebra, ML Algorithms, Statistics

Tools: Spark MLib, Mahout, Azure ML studio

__What does a Data Scientist do?__

- Data Scientist is given a problem
- Gather the raw data to solve the problem
- Data is processed and analyzed and prepared into a format in which it can be used and fed into analytics system, be it ML algorithms or statistical model
- Get meaningful data as output
- Communicate insights to others

__Must know Machine Learning Algorithms__

- Regression (continuous data)
- Clustering (unsupervised learning technique)
- Decision Tree (classification)
- Support Vector Machine
- Naïve Baiyes

__Life cycle of Data Science Project__

**Concept Study**

- Understanding the problem statement, thorough study of the business model
- Involves
- Understanding the business problem,
- Asking questions,
- Getting a good understanding of business model,
- Meet up with all stakeholders,
- Understanding what kind of data is available.

**Data Preparation**

- Also known as Data Gathering, Data Munging, Data Manipulation
- Formatting and structuring of data in an appropriate way
- Fulfill the gaps like missing value, null value, improper datatype, etc. in data

## Elements of## Data Preparation |

** **

** **

** ****Model Planning**

- Could be statistical models, ML models
- Involves Exploratory Data Analysis (EDA) to understand the relation between variables and to see what the data can tell us or data is appropriate or not
- Key variables are selected
- Training data and Test Data is created
- Tools Used
- R
- Python
- Matlab
- SAS

*Exploratory Data Analysis*

**Model Building**

- Using various analytical tools and techniques, data is transformed with the goal of discovering useful information to build the right model
- Using test data set, the built model is validated for the best accuracy
- Decide which Algorithm is to be used
- Tools Used
- Python packages like pandas, matplotlib, numpy

**Communication**

- Key findings are identified and conveyed to the stakeholders and team
- Explain findings and all insights
- Get recommendations and feedbacks

**Operationalize**

- Put findings into operation
- Final reports are prepared,
- Code and document technically and deliver

__Industries with high demand of Data Scientist__

- Gaming
- Healthcare
- Used specially for diagnosis and predicting disease

- Finance
- Bank, Insurance Companies

- Marketing
- Finding appropriate market

- Technology