Data Analytics with Python for Data Scientists – 3 Days
Python has become a powerful language and environment for performing data science. It combines a robust, object-oriented language with a powerful library of data science packages, such as numpy, scipy, matlibplot, scikit-learn, and pandas. These tools together make python one of the best combinations of robust programming language together with great library support.
What You Will Learn
- Quick Python primer
- Quick primer on data science algorithms
Data Analysts, Data Scientists, Developers – Format: 50% lecture, 50% hands-on labs
Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.
Students will need the following:
- a SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
- a browser to access the cluster. We recommend Firefox browser with FoxyProxy extension installed
- Experience and background in software development. Helpful to have some background in analytics or machine learning.
- Some background in Python highly recommended though a brief intro is included.
- Python language Overview
- Basics of Python language
- How to edit, run, and test python code
- Introducing the Anaconda distribution of Python.
- Using Jupyter notebooks.
- Series and Dataframes
- Loading data using Pandas
- NumPy and SciPy
- Linear Algebra
- Visualizing data with matlibplot
- Doing Data Science with Scikit-learn
- Introducing Scikit-Learn
- Clustering Data
- Building a Classifier
- Big Data With PySpark
- Introduction to Spark and PySpark
- Using the Spark framework for Big Data
- Using MLLib or Data Science in PySpark