### Advantages of This Course

Learn how to analyze large amounts of data to bring out insights

Gain hands-on knowledge through the problem solving based approach of the course along with working on a project at the end of the course

Relevant examples and cases make the learning more effective and easier

This course is for students pursuing their graduation/post-graduation and for working professionals

Learn to analyze data using SQL, R, SAS, Python, Predictive Analytics & Machine Learning

Become one of the most in-demand Data Analytics in the world today

## Course Curriculum

### Getting Started With Data Science And Recommender Systems

**Data Science Overview****Reasons to use Data Science****Project Lifecycle****Data Acquirement****Evaluation of Input Data****Transforming Data****Statistical and analytical methods to work with data****Machine Learning basics****Introduction to Recommender systems****Apache Mahout Overview**

### Reasons To Use, Project Lifecycle

**What is Data Science?****What Kind of Problems can you solve?****Data Science Project Life Cycle****Data Science-Basic Principles****Data Acquisition****Data Collection****Understanding Data- Attributes in a Data, Different types of Variables****Build the Variable type Hierarchy****Two Dimensional Problem****Co-relation b/w the Variables- explain using Paint Tool****Outliers, Outlier Treatment****Boxplot, How to Draw a Boxplot**

### Acquiring Data

**Discussion on Boxplot- also Explain****Example to understand variable Distributions****What is Percentile? – Example using Rstudio tool****How do we identify outliers?****How do we handle outliers?****Outlier Treatment: Using Capping/Flooring General Method****Distribution- What is Normal Distribution****Why Normal Distribution is so popular****Uniform Distribution****Skewed Distribution****Transformation**

### Machine Learning In Data Science

**Discussion about Box plot and Outlier****Goal: Increase Profits of a Store****Areas of increasing the efficiency****Data Request****Business Problem: To maximize shop Profits****What are Interlinked variables****What is Strategy****Interaction b/w the Variables****Univariate analysis****Multivariate analysis****Bivariate analysis****Relation b/w Variables****Standardize Variables****What is Hypothesis?****Interpret the Correlation****Negative Correlation****Machine Learning**

### Statistical And Analytical Methods Dealing With Data, Implementation Of Recommenders Using Apache Mahout And Transforming Data

**Correlation b/w Nominal Variables****Contingency Table****What is Expected Value?****What is Mean?****How Expected Value is differ from Mean****Experiment – Controlled Experiment, Uncontrolled Experiment****Degree of Freedom****Dependency b/w Nominal Variable & Continuous Variable****Linear Regression****Extrapolation and Interpolation****Univariate Analysis for Linear Regression****Building Model for Linear Regression****Pattern of Data means?****Data Processing Operation****What is sampling?****Sampling Distribution****Stratified Sampling Technique****Disproportionate Sampling Technique****Balanced Allocation-part of Disproportionate Sampling****Systematic Sampling****Cluster Sampling****2 angels of Data Science-Statistical Learning, Machine Learning**

### Testing And Assessment, Production Deployment And More

**Multi variable analysis****linear regration****Simple linear regration****Hypothesis testing****Speculation vs. claim(Query)****Sample****Step to test your hypothesis****performance measure****Generate null hypothesis****alternative hypothesis****Testing the hypothesis****Threshold value****Hypothesis testing explanation by example****Null Hypothesis****Alternative Hypothesis****Probability****Histogram of mean value****Revisit CHI-SQUARE independence test****Correlation between Nominal Variable**

### Business Algorithms, Simple Approaches To Prediction, Building Model, Model Deployment

**Machine Learning****Importance of Algorithms****Supervised and Unsupervised Learning****Various Algorithms on Business****Simple approaches to Prediction****Predict Algorithms****Population data****sampling****Disproportionate Sampling****Steps in Model Building****Sample the data****What is K?****Training Data****Test Data****Validation data****Model Building****Find the accuracy****Rules****Iteration****Deploy the model****Linear regression**

### Getting Started With Segmentation Of Prediction And Analysis

**Clustering****Cluster and Clustering with Example****Data Points, Grouping Data Points****Manual Profiling****Horizontal & Vertical Slicing****Clustering Algorithm****Criteria for take into Consideration before doing Clustering****Graphical Example****Clustering & Classification: Exclusive Clustering, Overlapping Clustering, Hierarchy Clustering****Simple Approaches to Prediction****Different types of Distances: 1.Manhattan, 2.Euclidean, 3.Consine Similarity****Clustering Algorithm in Mahout****Probabilistic Clustering****Pattern Learning****Nearest Neighbor Prediction****Nearest Neighbor Analysis**

### Integration Of R And Hadoop

**R introduction****How R is typically used****Features of R****Introduction to Big data****R+Hadoop****Ways to connect with R and Hadoop****Products****Case Study****Architecture****Steps for Installing RIMPALA****How to create IMPALA packages**