Hadoop Fundamentals I Version 2: Updated July 2013 Hadoop Fundamentals Fundamentals I teaches you the basics of Apache Hadoop and the concept of Big Data. The materials and software used in this course are all FREE!. This is the second version of this course. Review the What's New? section for a list of changes made from the version 1 of this course.
Welcome! About this course Page About your instructors URL What's New? Page Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL
Technical assistance Course forum
Reading material and references Hadoop: The Definitive Guide (May 2012) URL Hadoop Essentials - A Quantitative Approach (Oct 2012) URL Hadoop in Action (Dec 2010) URL 1
Lesson 1
Lesson 1: Introduction to Hadoop Learning objectives Understand what Hadoop is Understand what Big Data is Learn about other open source software related to Hadoop Understand how Big Data solutions can work on the Cloud
Instructions Review all the videos provided Complete the lab
Videos What is Hadoop? - Part 1 (3:49) URL What is Hadoop? - Part 2 (4:31) URL What is Hadoop? - Transcript URL
Hands-on lab - Creating your own Hadoop cluster We will use IBM InfoSphere BigInsights (BigInsights) software to work with Hadoop. BigInsights is available in different editions; this course uses the Quick Start Edition which is free, has no time usage limits and no data size usage limits. Step 1: Choose any of these options to work with BigInsights Option 1: Download and install BigInsights Download BigInsights Quick Start Edition (free to use) URL
Option 2: Use BigInsights on the Amazon Cloud Review the "Hadoop and Amazon Cloud" course (BD005EN) for details URL Option 3: Use BigInsights on the IBM SmartCloud Enterprise Review the "Hadoop and the IBM SmartCloud Enterprise" course (BD006EN) for details URL Option 4: Download and use the supplied VMWare image Download the 64-bit VMWare image URL Download and install free VMWare Player to play VMWare image URL Use the supplied VMWare image - User ID / password URL Step 2: Set up lab input files Download and copy the lab input files to the right locations Page Lab Solution Lab solution (6:41) URL
2
Lesson 2
Lesson 2: Hadoop architecture Learning objectives Understand the main Hadoop components Learn how HDFS works List data access patterns for which HDFS is designed Describe how data is stored in an HDFS cluster
Instructions Review all the videos provided Complete the lab
Videos Hadoop architecture and HDFS (8:01) URL Hadoop architecture and HDFS - Transcript URL Topology awareness and writing to HDFS (2:37) URL Topology awareness and writing to HDFS - Transcript URL HDFS Command Line (4:28) URL HDFS Command Line - Transcript URL
Hands-on lab Exploring HDFS - Lab instructions URL Lab solution (5:45) URL 3
Lesson 3
Lesson 3: Introduction to MapReduce Learning objectives Understand the concepts of map and reduce operations Describe how Hadoop executes a MapReduce job List MapReduce fault tolerance and scheduling features
List MapReduce fundamental data types Describe a MapReduce data flow
Instructions Review all the videos provided Complete the lab
Videos Map and Reduce operations - Introduction (4:21) URL Map and Reduce operations - Introduction - Transcript URL Submitting a MapReduce job (1:23) URL Submitting a MapReduce job - Transcript URL Distributed mergesort engine (1:11) URL Distributed mergesort engine - Transcript URL Fundamental data types (2:09) URL Fundamental data types - Transcript URL Fault tolerance (1:04) URL Fault tolerance - Transcript URL Scheduling and task execution (1:51) URL Scheduling and task execution - Transcript URL
Hands-on lab Using MapReduce - Lab instructions URL 4
Lesson 4
Lesson 4: Querying data Learning objectives Understand how to work with Pig, Hive and JAQL
Instructions Review all the videos provided Complete the lab
Videos An overview of Pig, Hive and Jaql (3:23) URL An overview of Pig, Hive and Jaql - Transcript URL Working with Pig (7:43) URL Working with Pig - Transcript URL Working with Hive (9:34) URL Working with Hive - Transcript URL Working with JAQL (4:28) URL Working with JAQL - Transcript URL
Hands-on lab Working with Jaql, Pig, and Hive - Lab instructions URL Working with Jaql, Pig and Hive - Lab solution Part 1 (5:01) URL Working with Jaql, Pig and Hive - Lab solution Part 2 (4:50) URL Working with Jaql, Pig and Hive - Lab solution Part 3 (5:07) URL
Working with Jaql, Pig and Hive - Lab solution Part 4 (4:35) URL 5
Lesson 5
Lesson 5: Hadoop administration Learning objectives Understand how to add and remove nodes in a Hadoop cluster Learn how to monitor the health status of your cluster Learn how to configure Hadoop
Instructions Review all the videos provided Complete the lab
Videos Adding and removing nodes to the cluster (7:46) URL Verifying cluster health & stopping/starting somponents (2:41) URL Configuring Hadoop - Part 1 (7:44) URL Configuring Hadoop - Part 2 (2:52) URL Setting up rack topology (1:52) URL
Hands-on lab Hadoop Administration - Lab instructions URL Hadoop Administration - Lab solution Part 1 (5:29) URL Hadoop Administration - Lab solution Part 2 (4:59) URL Hadoop Administration - Lab solution Part 3 (4:25) URL Hadoop Administration - Lab solution Part 4 (3:55) URL 6
Lesson 6
Lesson 6: Moving data into Hadoop Learning objectives Understand how to move data into Hadoop using Flume
Instructions Review all the videos provided Complete the lab
Videos Introduction to Flume (4:42) URL Introduction to Flume - Transcript URL Flume modes of operation and configuration (3:39) URL Flume modes of operation and configuration - Transcript URL
Hands-on lab Data Movement - Lab instructions URL
7
Test
Test your knowledge Test objectives and instructions Page Take the test! Quiz Evaluation Form: Please provide feedback Assignment Print your certificate! Not available until the activity Evaluation Form: Please provide feedback is marked complete. Not available until you achieve a required score in Take the test! .
SQL Access for Hadoop SQL Access for Hadoop teaches you how to take advantage of the SQL language to access big data stored in HDFS or HBase using SQL. The course presents the different alternatives for SQL access, such as Hive, Impala, and Big SQL. It explains the similarities and differences between these three technologies. The course includes hands on exercises and access to a Hadoop cluster with Hive, HBase, HDFS and Big SQL, so you can try these technologies first hand. At the end of the course you will understand the different alternatives for accessing Big Data with SQL, and you will gain hands-on experience with these technologies.
Welcome! About this course Page About your instructors URL Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL
Technical assistance Course forum
Reading material and references Hadoop in Action URL 1
Lesson 1
Lesson 1: Introduction to Hive, Big SQL and Impala Learning objectives Understand Hive, Big SQL and Impala concepts, terminology and architecture Understand similarities and differences between these technologies
Instructions Review all the videos provided Complete the lab
Videos Lesson Outline (0:57) URL Lesson Outline - Transcript URL SQL for Big Data: Overview (5:43) URL SQL for Big Data - Transcript URL Introduction to Hive (8:31) URL Introduction to Hive - Transcript URL Introduction to Impala (7:08) URL Introduction to Impala - Transcript URL Introduction to Big SQL (9:38) URL Introduction to Big SQL - Transcript URL
Hands-on lab - Accessing a Hadoop Cluster on the Cloud Follow the steps in this section to gain access to a Hadoop Cluster on the Cloud. Accessing the Cloud Based Environment for Exercises (6:30) URL Accessing the Cloud Based Environment for Exercises - Transcript URL Using putty with the IM Demo Cloud (5:17) URL Using putty with the IM Demo Cloud - Transcript URL 2
Lesson 2
Lesson 2: Working with SQL using Hive Learning objectives Learn how to create tables and run HiveQL queries f rom the command line
Instructions Review all the videos provided
Videos Lesson outline (00:45) URL Lesson Outline - Transcript URL Exploring and Configuring the Hive environment (5:35) URL Exploring and Configuring the Hive Environment - Transcript URL Hive Tables (7:45) URL Hive Tables - Transcript URL Querying data with Hive (6:28) URL Querying data with Hive - Transcript URL
Hands-on lab Lab instructions - Working with Hive URL 3
Lesson 3
Lesson 3: Working with SQL using Big SQL Lab objectives Learn how to configure your Big SQL environment Learn how to create tables and run Big SQL queries Understand how to work with the JSQSH command line interface Understand how to work with a JDBC or ODBC client
Instructions Watch the videos in this lesson
Review the lab instructions
Videos Exploring the Big SQL environment (6:05) URL Exploring the Big SQL Environment - Transcript URL
Starting, stopping and monitoring the Big SQL server process (4:14) URL Starting, stopping and monitoring the Big SQL server process - Transcript URL Configuring the Big SQL server (4:57) URL Configuring the Big SQL server - Transcript URL Getting started with JSQSH and connecting to a data source (10:56) URL Getting started with JSQSH and connecting to a data source - Transcript URL Creating and dropping schemas and tables (6:14) URL Creating and dropping schemas and tables - Transcript URL Loading tables and running queries (15:00) URL Loading tables and running queries - Transcript URL Working with Complex Data Types (7:19) URL Working with Complex Data Types - Transcript URL Connecting and running queries using JDBC and Eclipse(11:08) URL Connecting and running queries using JDBC and Eclipse - Transcript URL
Hands-on lab Lab instructions - Working with Big SQL URL 4
Lesson 4
Lesson 4: Accessing HBase with Hive and Big SQL Learning objectives Understand how to access HBase with Hive Understand how to access HBase with Big SQL Learn how to deal with HBase encoding and storage
Instructions Review all the videos provided
Complete the lab
Videos HBase Support: Overview (8:22) URL HBase Support: Overview - Transcript URL Working with Big SQL and HBase (15:01) URL Working with Big SQL and HBase - Transcript URL
Hands-on lab Accessing HBase with SQL URL 5
Lesson 5
Lesson 5: System Tables and Troubleshooting Learning objectives
Understand how to work with Catalog and System Tables with Big SQL Learn how to troubleshoot a problem in Big SQL
Instructions Review all the videos provided Complete the labs
Videos Troubleshooting in Big SQL (5:25) URL Troubleshooting in Big SQL - Transcript URL Inspecting Catalog and System Tables in Big SQL (3:11) URL Inspecting Catalog and System Tables in Big SQL - Transcript URL 6
Test
Test your knowledge Test objectives and instructions Page Take the test! Quiz Print your certificate! Not available until you achieve a required score in Take the test! .
Stream Computing I * Preview * Stream Computing I teaches you the basics of Stream Computing using IBM InfoSphere Streams. This is the first in a series of two courses. The course and the materials are all FREE. Trial software of InfoSphere Streams will be used for the labs.
Welcome! About this course Page Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL
Technical assistance Course forum (Input your feedback)
Download the course materials Download the VMWare Image (with a 90 day trial of Streams 3.1) for exercises URL
Reading material and references IBM InfoSphere Streams: Assembling Continuous Insight in the Information Revolution URL 1
Lesson 1
Lesson 1: Introduction to Stream Computing Learning objectives Understand what Stream Computing is all about
Instructions Review all the videos provided Complete the lab
Videos What is Stream Computing? (5:23) URL What is Stream Computing? - Transcript URL The evolution of analytics (4:30) URL The evolution of analytics - Transcript URL Event processing vs stream computing (3:01) URL Event processing vs. stream processing - Transcript URL Use cases for stream computing (3:09) URL Use cases for stream computing - Transcript URL Introduction to IBM InfoSphere Streams (7:24) URL Introduction to IBM InfoSphere Streams - Transcript URL
Hands-on lab - Downloading and installing InfoSphere Streams We will use IBM's InfoSphere Streams Trial software to work with Stream Computing. This trial software can be used for 90 days and has all the f eatures of the fee-based version. Download InfoSphere Streams (trial version) URL Install InfoSphere Streams - Instructions URL 2
Lesson 2
Lesson 2: Streams concepts and terms Learning objectives Understand Streams concepts such as instances, hosts, operators, PEs, and jobs.
Instructions Review all the videos provided Complete the lab
Videos Streams instances and hosts (3:46) URL Streams instances and hosts - Transcript URL Operators and Processing Elements (5:27) URL Operators and Processing Elements - Transcript URL Components of Streams (4:36) URL Components of Streams - Transcript URL Streams Studio IDE (3:53) URL 3
Lesson 3
Lesson 3: Streams applications Learning objectives Working with SPL Get started with Streams applications
Instructions Review all the videos provided Complete the lab
Videos What is the Streams Processing Language (SPL)? (5:26) URL What is the Streams Processing Language (SPL) - Transcript URL 4
Lesson 4
Lesson 4: Composing an Application in
Streams Learning objectives Understand how to work with Streams operators such as Functor, Aggregate, InetSource, and more!
Instructions Review all the videos provided Complete the lab
Videos Setting up the environment and the inetSource operator (7:24) URL Using the custom operator (9:33) URL Using the filter operator (6:34) URL Using the sort operator and tumbling windows (10:43) URL Extracting values using Aggregate (7:42) URL Working with the Join operator (14:17) URL Selecting out columns using Functor operator (9:44) URL Building an entire application with Drag and Drop in Streams 3.0 (36:17) URL 5
Lesson 5
Lesson 5: Deploying Streams Applications Learning objectives Understand how to deploy a Stream application
Instructions Review all the videos provided Complete the lab
Videos Runtime architecture and introduction to topologies (5:36) URL Runtime architecture and introduction to topologies - Transcript URL Working with instances (2:00) URL Working with instances - Transcript URL Using StreamTool (4:52) URL Using StreamTool - Transcript URL 6 Not available 7 Not available
Spreadsheet-like Analytics Spreadsheet-like Analytics teaches you how to explore big data and takes you into a journey of discovery without having to write a single line of code. Using BigSheets, a t ool developed by IBM Research, you can perform analytics on big data with an interfa ce similar to a regular spreadsheet. BigSheets masks all complexities of processing big data, and let 's analysts and managers concentrate on getting the analytics they want without having to know how to code.
Welcome! About this course Page Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL
Technical assistance Course forum 1
Lesson 1
Lesson 1: Getting started with BigSheets Learning objectives Understand what BigSheets is Learn who are the target users for BigSheets
Instructions Review all the videos provided
Videos Introduction to BigSheets (3:49) URL What can you do with BigSheets? (1:11) URL Working with BigSheets (3:31) URL A tour of BigSheets - Part 1 (2:59) URL A tour of BigSheets - Part 2 (3:01) URL 2
Lesson 2
Lesson 2: Discovering what BigSheets can do Learning objectives Using a simple scenario, understand BigSheets features and capabiliti es
Instructions Review all the videos provided
Videos Gathering input data from an application (4:04) URL Manipulating data in BigSheets (3:26) URL Overview of other BigSheets scenarios (2:31) URL 3
Lesson 3
Lesson 3: Deep Dive into BigSheets Learning objectives Exploring data by adding sheets Understanding workflow and workbook diagrams Monitoring BigSheets in the Dashboard
Instructions Review all the videos provided Complete the lab
Videos Exploring Data by Adding Sheets - Part 1 (6:32) URL Exploring Data by Adding Sheets - Part 1 - Transcript URL Exploring Data by Adding Sheets - Part 2 (7:40) URL Exploring Data by Adding Sheets - Part 2 - Transcript URL Exploring Data by Adding Sheets - Part 3 (8:02) URL Exploring Data by Adding Sheets - Part 3 - Transcript URL Exploring Data by Adding Sheets - Part 4 (7:58) URL Exploring Data by Adding Sheets - Part 4 - Transcript URL Exploring Data by Adding Sheets - Part 5 (6:46) URL Exploring Data by Adding Sheets - Part 5 - Transcript URL Understanding Workflow and Workbook Diagrams. (5:04) URL Understanding Workflow and Workbook Diagrams - Transcript UR L Monitoring BigSheets in Dashboard (4:26) URL Monitoring BigSheets in Dashboard - Transcript URL 4
Lesson 4
Lesson 4: A complete case study using BigSheets Learning objectives Understand how to work with BigSheets using a complete case study
Instructions Review all the videos provided
Videos
BigSheets and the case study overview (2:12) URL Case Study - Part 1 (3:49) URL Case Study - Part 2 (2:42) URL Case Study - Part 3 (2:42) URL Case Study - Part 4 (2:42) URL Case Study - Part 5 (2:42) URL Case Study - Part 6 (1:13) URL 5 Not available 6 Not available 7 Not available
Java Fundamentals *Preview* Brought to you by SciSpike ( www.scispike.com ) Java Fundamentals teaches you the basics of the Java Programming Language. The skills you gain can also help you with Big Data technologies since MapReduce jobs in Hadoop can be written in J ava.
Course Feedback (help us complete developing this course!) Course forum (input your feedback) 1
Lesson 1
Lesson 1: Java overview Learning objectives Learn about the history of Java Understand what JVM, JRE, JDK, and Java APIs are Learn about Java Editions
Instructions Complete all the presentations
Presentations Java Overview SCORM package 2
Lesson 5
Lesson 5: Packages and Access Control Learning objectives Understand what packages are Learn about packages naming convention Learn about access level modifiers (private, protec ted, public) Understand the import statement
Instructions Complete all the presentations
Presentations Packages and Access Control SCORM package 3
Lesson 7
Lesson 7: Arrays Learning objectives Learn what arrays are Understand the syntax for arrays in Java Learn how to work with arra ys Compare arrays to collections
Instructions Complete all the presentations
Presentations Arrays SCORM package 4
Lesson 10
Lesson 10: JavaBeans Learning objectives Learn what JavaBeans are Implementing the serializable interface Learn about JavaBeans properties Understand what is introspection
Instructions Complete all the presentations
Presentations JavaBeans SCORM package 5
Lesson 12
Lesson 12: Additional Features Learning objectives Learn about the enhanced for loop (foreach) Understand what is Autoboxing Learn about varargs Learn about static imports Understand how to work with annotations
Instructions Complete all the presentations
Presentations
Additional Features SCORM package
Hadoop Reporting and Analysis Brought to you by Jaspersoft (www.jaspersoft.com ) Hadoop Reporting and Analysis teaches you how to build your own Hadoop/Big Data r eports over relevant Hadoop technologies such as HBase, Hive, etc. It provides guidelines to choose between various reporting techniques: Direct Batch Reports, Live Exploration, and Indirect Batch Analysis. Hands-on labs are included using the free version of Jas persoft and BigInsights (IBM's Hadoop distribution). All materials and software used are FREE!
Welcome! About this course Page Taking this course, a guided tour (7:01) URL Taking this course, a guided tour - Transcript URL
Technical assistance Course forum Instructions to Download Jaspersoft Software File Attachments Folder 1
Lesson 1
Lesson 1: Introduction to Reporting and Analysis on Hadoop Learning objectives - Understanding Why Reporting and Analysis on Hadoop is important - Approaches to Big Data reporting and analysis - Big Data Access Technologies for Reporting and Analysis - Business Intelligence and Hadoop Architecture
Instructions - Review all the videos provided
Videos Introduction to Reporting and Analytics on Hadoop (14:11) URL Introduction to Reporting and Analytics on Hadoop - Transcript URL 2
Lesson 2
Lesson 2: Direct Batch Reporting on
Hadoop Learning objectives - Understanding Direct Batch Reporting - Importance of Direct Batch Reporting on Hadoop - Guideline to choose Direct Batch Reporting approach - Creating a Direct Batch Report on Hadoop
Instructions - Review all the videos provided - Complete the lab
Videos Direct Batch Reporting (4:51) URL Direct Batch Reporting Demo (10:27) URL
Hands-on lab Creating Direct batch reports for big data - Instructions URL Creating a big data direct batch report - Solution (11:36) URL 3
Lesson 3
Lesson 3: Live Exploration of Big Data Learning objectives - Understanding Live Exploration of Big Data - Guidelines to choose Live Exploration approach to Big Data analysis - Perform Live Exploration of Big Data on Hadoop
Instructions - Review all the videos provided - Complete the lab
Videos Live Exploration Reporting (5:22) URL Live Exploration Tutorial (10:43) URL
Hands-on lab Practice Live Exploration URL Practice Live Exploration - Solution (12:56) URL 4
Lesson 4
Lesson 4: Indirect Batch Analysis on Hadoop Learning objectives
- Understanding Indirect Batch Analysis on Hadoop - Guidelines to choose Indirect Batch Analysis approach - Perform Indirect Batch analysis on Big Data
Instructions - Review all the videos provided - Complete the lab
Videos Indirect Batch Analysis of Big Data (5:50) URL Indirect Batch Analysis of Big Data - Demo (4:47) URL
Hands-on lab Indirect Batch Analysis - Lab Instructions URL Indirect Batch Analysis - Lab Solution (6:11) URL 5
Test
Test your knowledge Test objectives and instructions Page Take the test! Quiz Print your certificate! Not available until you achieve a required score in Take the test! . 6
Evaluation Form
Evaluation form Evaluation Form: Please provide feedback