Schița de curs
- Section 1: Introduction to Hadoop
- hadoop history, concepts
- eco system
- distributions
- high level architecture
- hadoop myths
- hadoop challenges
- hardware / software
- Labs : first look at Hadoop
- Section 2: HDFS Overview
- concepts (horizontal scaling, replication, data locality, rack awareness)
- architecture (Namenode, Secondary namenode, Data node)
- data integrity
- future of HDFS : Namenode HA, Federation
- labs : Interacting with HDFS
- Section 3 : Map Reduce Overview
- mapreduce concepts
- daemons : jobtracker / tasktracker
- phases : driver, mapper, shuffle/sort, reducer
- Thinking in map reduce
- Future of mapreduce (yarn)
- labs : Running a Map Reduce program
- Section 4 : Pig
- pig vs java map reduce
- pig latin language
- user defined functions
- understanding pig job flow
- basic data analysis with Pig
- complex data analysis with Pig
- multi datasets with Pig
- advanced concepts
- lab : writing pig scripts to analyze / transform data
- Section 5: Hive
- hive concepts
- architecture
- SQL support in Hive
- data types
- table creation and queries
- Hive data management
- partitions & joins
- text analytics
- labs (multiple) : creating Hive tables and running queries, joins , using partitions, using text analytics functions
- Section 6: BI Tools for Hadoop
- BI tools and Hadoop
- Overview of current BI tools landscape
- Choosing the best tool for the job
Cerințe
- programming background with databases / SQL
- basic knowledge of Linux (be able to navigate Linux command line, editing files with vi / nano)
Lab environment
Zero Install : There is no need to install hadoop software on students’ machines! A working Hadoop cluster will be provided for students.
Students will need the following
- an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
- a browser to access the cluster. We recommend Firefox browser with FoxyProxy extension installed
Mărturii (4)
I thought he did a great job of tailoring the experience to the audience. This class is mostly designed to cover data analysis with HIVE, but me and my co-worker are doing HIVE administration with no real data analytics responsibilities.
ian reif - Franchise Tax Board
Curs - Data Analysis with Hive/HiveQL
Cunoștințe profesionale de pe piață oferite de un expert
Bartlomiej Srednicki - GP Strategies Poland sp. z o.o.
Curs - Fintech: A Practical Introduction for Managers
Tradus de catre o masina
Many hands-on sessions.
Jacek Pieczątka
Curs - Administrator Training for Apache Hadoop
practical things of doing, also theory was served good by Ajay