Category Archives: Hadoop Cluster

Data Science Training Bangalore – ITPL, Whitefield Sep 2015

 

Data Science Training

Data Science Training

Data Science Training Bangalore – ITPL, Whitefield Sep 2015

Duration:   100 Hours, One Month

Fees:             INR 10,000/-

Location:    InfoVision Solution India Pvt Ltd,  7th Floor, Discoverer, ITPL, Whitefield, Bangalore – 560066

Schedule:   2-4 Hours per day, weekdays and weekend flexi hours

Outcome:   Able to independently deliver data science projects

Certification:  InfoVision Certified Data         Scientist Level 1

 Free Demo Class

On

Saturday 22nd August 2015

From

10:00AM to 12:00PM

At

InfoVision Solution India Pvt Ltd, 

7th Floor, Discoverer, ITPL, Whitefield,

Bangalore – 560066

Register for Free Demo Class: http://goo.gl/forms/Sq1xb6AeQJ

What is Big Data?

Big data refers to datasets whose volume, velocity, variety, and complexity exceed the ability of commonly used software tools to capture, process, store, manage, and analyze. Big Data is the combination of different types of data:

•Unstructured—data communicated every day by email, phone, text, tweet, and video

•Semi-structured—data generated by machines

•Structured-data – traditionally stored in databases, such as account information and credit card transactions for example, log files

The challenge of Big Data is to efficiently and effectively capture, store, manage, and analyze 100 percent of the data to drive business insight and timely decisions.

 

What is Apache Hadoop?

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.

 

What is R Programming?

R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

 

What is Pentaho?

The Pentaho BI Project is an ongoing effort by the Open Source community to provide organizations with best-in-class solutions for their enterprise Business Intelligence (BI) needs.

Register for Free Demo Class: http://goo.gl/forms/Sq1xb6AeQJ

Course Contents

Statistics and Business Analytics – 40 Hours

  • Introduction to R Programming – R Studio, Shiny
  • Introduction and Data Analytics
  • Statistics – Mean, Mode, Median, Standard Deviation
  • Introduction to WEKA
  • Classification, Rules of Association, Regression Analysis, Cluster Analysis
  • Algorithms – K-means, TwoStep, Kohonen net, Apriori and GRI
  • Decision Tree and Clustering
  • Projects : Patient Analytics, Automobile Analytics, Football Analytics, Stock Market Analytics

Data Visualization – 20 Hours

  • Introduction
  • Pentaho – Installation
  • Pentaho Report Designer (PRD)
  • PostgreSQL connection to Pentaho Tools
  • Pentaho Data Integration
  • Saiku Analytics in Pentaho

Big Data & Hadoop

Cloudera 5.3 Installation – 20 Hours

  • Introduction To Hadoop Distributed File System (HDFS).
  • Understanding – Map-Reduce Basics
  • SQOOP / ZOOKEEPER
  • HBASE
  • PIG
  • HIVE
  • Flume

Case Studies & Projects -  20 Hours

  • Case Studies
  • Supply Chain Optimization
  • Genome Analysis
  • Project Work

Register for Free Demo Class: http://goo.gl/forms/Sq1xb6AeQJ