Data Science with Hadoop for Statistics and Text Analysis – 3 Days

Course Description

The course teaches data science principles and applications through lecture and hands-on labs. Students will learn how to select the right tool for the job and the strengths of each tool while gaining practical experience in creating working systems.

Intended Audience

This course is intended for architects, software developers, analysts and data scientists who need to understand how to apply data science to large datasets with Hadoop.


  • Understand the foundations of data science
  • Understand the principles of machine learning
  • Learn about Hadoop and its interaction with data science
  • Learn to program in R and to use it for statistical analysis
  • Analyze texts with Python NLTK
  • Understand recommender systems
  • Compare implementing a recommender with R and Mahout


Students must have basic computer skills, basic knowledge in statistics and a basic understanding of programming or scripting. Prior experience with Hadoop, Mahout, R or Python is helpful but not required.


  • Lab Content
    • Set Up Development Environment
    • Defining the Problem
    • Programming in R
    • Analyzing Data with R
    • Creating the User/Item Matrix
    • Recommender Lab with R
    • Recommender Lab with Mahout
Print Friendly, PDF & Email