java programming

Python for Data Analysis and Machine Learning

5 Days – Onsite, Custom, Lowest Price

Course Description – Python for Data Analysis and Machine Learning

This Python for Data Analysis and Machine Learning training course is intended for data scientists and software engineers. It gives them practical level of experience, achieved through a combination of about 50% lecture, 50% lab work.

Python is a popular open source language. It has libraries for almost everything, including web programming, administrative tasks, system programming, mathematics, machine learning, and graphics.

What You Will Learn

  • Installing Python and writing basic scripts
  • Using built in data structures
  • Using all flow control features
  • Reading and writing from and to files
  • Using Python’s extensive libraries and functions
  • Accessing databases
  • Using Jupyter Notebook
  • Reading, cleaning, and structuring data
  • Analyzing data
  • Visualizing data

Prerequisites and Audience

  • Be able to navigate Linux command line
  • Familiarity with programming

Audience: Data Scientists, Developers, Administrators.

Lab Environment

Working environment will be provided for students. Students would only need an SSH client and a browser.
Zero Install: There is no need to install software on students’ machines.

Outline

  • Python Introduction
    • Installing Python
    • Python Versions
    • IDEs
    • Jupyter Notebook
  • Python Language Overview and First Steps
    • Data Types
    • NumPy
    • Packages
    • Pandas
  • Python OOP
    • Classes
    • Modules/Packages
    • Python Packages
    • Data Types
  • Pandas
    • DataFrames
    • Schema inferences
    • Data exploration
  • NumPy
    • Capabilities
    • Data types
    • Packages
  • Python – DB Programming
    • Database Connectivity
    • Pandas and DB
    • ORM
  • Python – Web Programming
    • Python Web Frameworks
    • Flask
    • Restful API with Flask
  • Visualization
    • Pandas visualization
    • Matplotlib
    • Seaborn
    • Ggplot
    • Doing Data Science with Scikit-learn
    • Introducing Scikit-Learn
    • Clustering Data
    • Building a Classifier

NLTK

  • Bag-of-words (NLTK labs in python)
  • Bag-of-n-Grams
  • Filtering (NLTK labs, later-spacy)
  • Stopwords
  • Frequency-based
  • Stemming
  • Parsing and tokenization
  • TF-IDF
  • SpaCy for semantic pipeline and named entity recognition

Section 1: Machine Learning (ML) Overview

  • Machine Learning landscape
  • Machine Learning applications
  • Understanding ML algorithms & models (supervised and unsupervised)

Section 2: Machine Learning Environment

  • Introduction to Jupyter notebooks / R-Studio
  • Lab: Getting familiar with ML environment

Section 3: Machine Learning Concepts

  • Statistics Primer
  • Covariance, Correlation, Covariance Matrix
  • Errors, Residuals
  • Overfitting / Underfitting
  • Cross validation, bootstrapping
  • Confusion Matrix
  • ROC curve, Area Under Curve (AUC)
  • Lab: Basic stats

Section 4: Feature Engineering (FE)

  • Preparing data for ML
  • Extracting features, enhancing data
  • Data cleanup
  • Visualizing Data
  • Lab : data cleanup
  • Lab: visualizing data

Section 5: Linear regression

  • Simple Linear Regression
  • Multiple Linear Regression
  • Running LR
  • Evaluating LR model performance
  • Lab
  • Use case: House price estimates

Section 6: Logistic Regression

  • Understanding Logistic Regression
  • Calculating Logistic Regression
  • Evaluating model performance
  • Lab
  • Use case: credit card application, college admissions

Section 7: Classification : SVM (Supervised Vector Machines)

  • SVM concepts and theory
  • SVM with kernel
  • Lab
  • Use case: Customer churn data

Section 8: Classification : Decision Trees & Random Forests

  • Theory behind trees
  • Classification and Regression Trees (CART)
  • Random Forest concepts
  • Labs
  • Use case: predicting loan defaults, estimating election contributions

Section 9: Classification : Naive Bayes

  • Theory behind Naive Bayes
  • Running NB algorithm
  • Evaluating NB model
  • Lab
  • Use case: spam filtering

Section 10: Clustering (K-Means)

  • Theory behind K-Means
  • Running K-Means algorithm
  • Estimating the performance
  • Lab
  • Use case: grouping cars data, grouping shopping data

Section 11: Principal Component Analysis (PCA)

  • Understanding PCA concepts
  • PCA applications
  • Running a PCA algorithm
  • Evaluating results
  • Lab
  • Use case: analyzing retail shopping data

Section 12: Recommendation (Collaborative filtering)

  • Recommender systems overview
  • Collaborative Filtering concepts
  • Lab
  • Use case: movie recommendations, music recommendations

Section 13: Final workshop (time permitting)

Students will analyze a couple of datasets and run ML algorithms.
This is done as a group exercise.  Each group will present their findings to the class.

Other courses to explore:

Advanced Python Training – Onsite, Custom, Lowest Price

Introduction to Python 3 – Onsite, Custom, Lowest Price

Design Patterns in Java – Onsite, Custom, Lowest Price

Overview of Java EE Development – Onsite, Custom, Lowest Price

XML and Web Services Training – Onsite, Custom, Lowest Price

Python training MindIQ

 MindIQ.com 

Print Friendly, PDF & Email