High Level view on Data Science

High Level view on Data Science

What is Data Science?

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. Data science is the same concept as data mining and big data: "use the most powerful hardware, the most powerful programming systems, and the most efficient algorithms to solve problems".

Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.[4] It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge. In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundational professional communities.

Some Scope of Data Science

  • Automatic Self driving Car
    • Eg: Tesla, Baidu, Waymo
  • Delayed or Cancelled flights information
    • Route Planning
    • Predictive Analytics
    • Promotional Offers
    • Deciding which class of planes to purchase for better performance
  • FedEx use Data Science models for operational efficiency
  • Better Decision Making, Predictive Analysis, Pattern Discovery

To buy new furniture for office

  • Which website to use
  • Check Rating of website
  • Check for Discount
  • Check if furniture is appropriate or not

Taking Cab

  • Choose route


  • What kind of shows people are interested in
  • Easy to make Decision for appropriate advertisement 


  • Elections
  • Influence the voters
  • Personalized messages

Process / Steps in Data Science

  • Asking the right questions and exploring the data
  • Modeling the data using various algorithms (basically for Machine Learning)
  • Finally communicating and visualizing the results


Data Science vs Business Intelligence


Business Intelligence

Data Science

Data Source

Structured data e.g. Data Warehouse

Unstructured data e.g. Web logs





Statistics, Visualization

Statistics, Visualization, Machine Learning


Past and Present Data

Present Data and Future Predictions


Prerequisites for Data Science

  • Curiosity
    • Only when you ask questions, you will have a better understanding of the business problem
  • Common Sense
    • To identify new ways to solve a business problem and to detect priority problems
  • Communication Skills
    • Communicate their findings to business teams to act upon the insights
  • Machine Learning
    • Machine learning is the backbone of Data Science. It is one of the many ways that Data Science uses to find solution to a problem.
  • Mathematics Modelling
    • Mathematical Models can be extremely helpful to make fast calculations and predictions from what you know of you data
  • Statistics
    • It is core foundational to Data Science, to extract knowledge and obtain better results from the data
  • Programming
    • You should know at least one programming language, preferably Python or R for data modelling
  • Databases
    • The discipline of querying databases teaches you to ask better questions as a Data Scientist

Tools / Skills used in Data Science

Data Analysis

Skills: R, Python, Statistics

Tools: SAS, Jupyter, R studio, MATLAB, Excel, RapidMiner


Data Warehousing

Skills: ETL, SQL, Hadoop, Apache Spark

Tools: Informatica / Talend, AWS Redshift


Data Visualization

Skills: R, Python libraries

Tools: Jupyter, Tableau, Cognos, RAW


Machine Learning

Skills: Algebra, ML Algorithms, Statistics

Tools: Spark MLib, Mahout, Azure ML studio


What does a Data Scientist do?

  • Data Scientist is given a problem
  • Gather the raw data to solve the problem
  • Data is processed and analyzed and prepared into a format in which it can be used and fed into analytics system, be it ML algorithms or statistical model
  • Get meaningful data as output
  • Communicate insights to others


Must know Machine Learning Algorithms

  • Regression (continuous data)
  • Clustering (unsupervised learning technique)
  • Decision Tree (classification)
  • Support Vector Machine
  • Naïve Baiyes


Life cycle of Data Science Project

Concept Study

  • Understanding the problem statement, thorough study of the business model
  • Involves
    • Understanding the business problem,
    • Asking questions,
    • Getting a good understanding of business model,
    • Meet up with all stakeholders,
    • Understanding what kind of data is available.



Data Preparation

  • Also known as Data Gathering, Data Munging, Data Manipulation
  • Formatting and structuring of data in an appropriate way
  • Fulfill the gaps like missing value, null value, improper datatype, etc. in data

Elements of

Data Preparation





 Model Planning

  • Could be statistical models, ML models
  • Involves Exploratory Data Analysis (EDA) to understand the relation between variables and to see what the data can tell us or data is appropriate or not
  • Key variables are selected
  • Training data and Test Data is created
  • Tools Used
    • R
    • Python
    • Matlab
    • SAS

Exploratory Data Analysis 


Model Building

  • Using various analytical tools and techniques, data is transformed with the goal of discovering useful information to build the right model
  • Using test data set, the built model is validated for the best accuracy
  • Decide which Algorithm is to be used
  • Tools Used
    • Python packages like pandas, matplotlib, numpy


  • Key findings are identified and conveyed to the stakeholders and team
  • Explain findings and all insights
  • Get recommendations and feedbacks


  • Put findings into operation
  • Final reports are prepared,
  • Code and document technically and deliver

Industries with high demand of Data Scientist

  • Gaming
  • Healthcare
    • Used specially for diagnosis and predicting disease
  • Finance
    • Bank, Insurance Companies
  • Marketing
    • Finding appropriate market
  • Technology

Add comment