Data Analytics with Python for Data Scientists – 3 Days

Course Description

Python has become a powerful language and environment for performing data science.   It combines a robust, object-oriented language with a powerful library of data science packages, such as numpy, scipy, matlibplot, scikit-learn, and pandas.  These tools together make python one of the best combinations of robust programming language together with great library support.

What You Will Learn

  • Quick Python primer
  • Quick primer on data science algorithms
  • NumPy
  • SciPy
  • Pandas
  • Scikit-learn

Intended Audience

Data Analysts, Data Scientists, Developers – Format: 50% lecture, 50% hands-on labs

Lab Environment

Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.

Students will need the following:


  • Experience and background in software development.  Helpful to have some background in analytics or machine learning.
  • Some background in Python highly recommended though a brief intro is included.


  1. Python language Overview
    • Basics of Python language
    • How to edit, run, and test python code
    • Introducing the Anaconda distribution of Python.
    • IDEs
    • Using Jupyter notebooks.
  2. Pandas
    • Series and Dataframes
    • Loading data using Pandas
    • Labs
  3. NumPy and SciPy
    • Arrays
    • Matricies
    • Linear Algebra
    • Labs
    • Visualizing data with matlibplot
  4. Doing Data Science with Scikit-learn
    • Introducing Scikit-Learn
    • Clustering Data
    • Building a Classifier
  5. Big Data With PySpark
    • Introduction to Spark and PySpark
    • Using the Spark framework for Big Data
    • Using MLLib or Data Science in PySpark


Print Friendly, PDF & Email