Big Data in Practice using Spark

If fast prototyping and processing speed are a priority in your Big Data environment, Spark will most likely be the platform of your choice. Apache Spark is an open source processing engine focusing on low latency, ease of use, and analytics. It's an alternative to the slower MapReduce approach delivered by Hadoop.

This course builds on the foundations laid in the course 'Big Data Concepts'.
In this course you will get hands-on practice on Linux with Spark and its libraries for machine learning and visualisation. You will also learn how to implement robust data processing in Scala with an SQL-style interface, and with the other APIs for Java and Python.

After successful completion of this course, you will have gained sufficient expertise to set up a big data environment, to import data into it, and to interrogate it using Spark. You will also be able to write simple Scala and SparkSQL programs that use the Mllib and GraphX libraries.

This course is also available for exclusive, one-company presentations.

What you will learn

On successful completion of this course you will be able to:

explain the concepts of Apache Spark and its components
set up a Big Data environment
implement data processing in Scala using an SQL-style interface
implement data processing with other APIs for Java and Python
write and debug programs for data analytic problems.

Who Should Attend

Whoever wants to start practising "big data": developers, data architects, and anyone who needs to work with big data technology.

Prerequisites

Familiarity with the concepts of data stores and more specifically of "Big Data" is necessary; see our course Big Data Concepts. Additionally, somel knowledge of SQL and UNIX is useful. Experience with at least one programming language (Java, PHP, Python, Scala, C++ or C#) is a must.

Duration

2 days

Fee (per attendee)

£1470 (ex VAT)

This includes free online 24/7 access to course notes.

Hard copy course notes are available on request from rsmshop@rsm.co.uk

at £50.00 plus carriage per set.

Course Code

BDSA

Motivation for Spark & Base Concepts

The Apache Spark project and its components; Getting to learn the Spark architecture and programming model.

Data Sources

Learn how to access data residing in Hadoop HDFS, Cassandra, or Hbase.

Interfaces

Working with the several programming interfaces and the web interface; Writing and debugging programs for simple data analytic problems.

introduction to Hadoop HDFS, HBase, and Cassandra

Hadoop HDFS; Hbase; Cassandra.