- This event has passed.
Data transformation and analysis using Apache Spark: Melbourne, 12-13 August 2019
August 12 @ 9:30 am - August 13 @ 5:00 pm
Secure your place today
Developed by Jeffrey Aven, author of SAMS Teach Yourself Apache Spark and Data and Analytics with Spark using Python, this course will provide the core knowledge and skills needed to develop applications using Apache Spark.
The “Data Transformation and Analysis Using Apache Spark” module is the first of three modules in the “Big Data Development Using Apache Spark” series, and lays the foundations for subsequent modules including “Stream and Event Processing using Apache Spark” and “Advanced Analytics using Apache Spark”.
This course provides a detailed overview of the Spark runtime and application architecture as well as the fundamental concepts of the RDD and DataFrame APIs in Spark.
Basic primers on the map reduce processing pattern and functional programming using Python are provided as well.
The course will teach Spark programming using the transformations and actions available in the RDD and DataFrame APIs and within Spark SQL. Hands-on exercises are provided throughout the course to reinforce concepts.
As well as basic programming skills, additional deep dives are provided into additional programming and runtime constructs such as broadcast variables, accumulators, and RDD and DataFrame storage and lineage options.
Topics covered include:
- Spark introduction and background
- Map reduce processing pattern
- Spark deployment modes
- Spark runtime and application architecture
- Understanding Spark RDDs
- Using Spark with distributed file systems and object stores
- Functional programming with Python
- Using Spark RDD transformations and actions
- RDD storage levels
- Caching, persistence and checkpointing of Spark RDDs
- Broadcast variables and accumulators
- Partitioning in Spark
- Processing RDDs with external programs
- Improving Spark application performance
- Apache hive metastore overview
- DataFrame API and Spark SQL architecture
- Using the DataFrameReader and DataFrameWriter APIs
- Using DataFrame API transformations and actions
- Using Spark SQL
- Choosing between the RDD and DataFrame APIs
Who should attend?
This course is suitable for developers and analysts who will be working with Spark. It is ideally suited for users transitioning to a Spark runtime environment from a relational database programming or analysis background (eg, data warehouse/ETL developers or BI analysts).
- General programming skills
- Basic Python programming skills
- Some data warehouse, BI or transactional database experience is preferable but not required
- Some prior exposure to Spark or Hadoop is preferable but not required
Attendees should, by the end of the course:
- Understand the Spark distributed processing framework and runtime architecture
- Understand the fundamentals of Spark programming using both the RDD and DataFrame APIs
- Have mastery over the basic transformations and actions in the Spark RDD API
- Have mastery over basic Spark DataFrame operations
- Be prepared for more advanced topics in Spark including Spark streaming and machine learning
The instructor: Jeffrey Aven
Jeffrey Aven is a big data, open source software, and cloud computing consultant, author and instructor based in Melbourne, Australia.
Jeffrey has extensive experience as a technical instructor, having taught courses on Hadoop and HBase for Cloudera (awarded Cloudera Hadoop Instructor of the Year for APAC in 2013) and courses on Apache Kafka for Confluent in addition to delivering his own courses.
Jeffrey is also the author of several Big Data related books including SAMS Teach Yourself Hadoop in 24 Hours, SAMS Teach Yourself Apache Spark in 24 Hours and Data Analytics with Spark using Python.
In addition to his credentials as an instructor and author, Jeff has over thirty years of industry experience and has been involved in key roles with several major big data and cloud implementations over the last several years.
Earlybird pricing is available until 26 July 2019.
Group discounts also apply during the earlybird period: 5% for 2–4 people, 10% for 5–6 people, 15% for 7–8 people, and 20% for 9 or more people. Select your desired quantity of tickets and click “Add to cart” to see the discount calculated before checkout.
About our training
Eugene Dubossarsky’s courses are unlike those offered in universities, online, or by private providers. His data-science classes, in particular, give clients not just knowledge of a process, but the real power of understanding the underlying concepts, allowing them to confidently practice, manage, promote and risk-assess data science.
Dr Dubossarsky says “the way many courses teach data science is like teaching people to memorise and recite poetry in a language they do not understand”. By contrast, he confers an understanding of that language, taught in an intuitive, accessible way that leaves trainees with an instinct for data science. Keeping formulae and mathematics to a bare minimum and taking an intuitive, visual approach, Eugene’s courses deliver a compressed mentoring experience as much as they do content. This is difficult for an average trainer to replicate. Trainees benefit from his extensive knowledge and over 20 years of commercial data-science experience, as well as his unique teaching style.
The resulting testimonials speak for themselves, and candidates come from all walks of life: CEOs, general managers, salespeople, IT professionals, marketing staff, public servants and of course people from many functions in the finance world. These testimonials are extensive, and many more are available on request. With specific regard to finance, Eugene has mentored and advised senior leaders and their teams in a number of major Australian banks.
Questions and further details
Meals and refreshments
Catered morning tea and lunch are provided on both days of the course. Please notify us at least a week ahead if you have any special dietary requirements.
Use firstname.lastname@example.org to email us any questions about the course, including requests for more detail, or for specific content you would like to see covered, or queries regarding prerequisites and suitability.
If you would like to attend but for any reason cannot, please also let us know.
Course material may vary from advertised due to demands and learning pace of attendees. Additional material may be presented, along with or in place of advertised.
Cancellations and refunds
You can get a full refund if you cancel 2 weeks or more before the course starts. No refunds will be issued for cancellations made less than 2 weeks before the course starts.
Frequently asked questions (FAQ)
Do I need to bring my own computer?
There’s no need to bring your own laptop or PC. Our courses take place in modern, professional training facilities that have all the computing equipment you’ll need.
I'm lost! How do I find the venue?
Presciient training, coaching, mentoring and capability development for analytics
Please ask about tailored, in-house training courses, coaching analytics teams, executive mentoring and strategic advice and other services to build your organisation's strategic and operational analytics capability.
Our courses include:
- Predictive Analytics, Machine Learning, Data Science and AI
- Data Literacy for Everyone
- Introduction to R and Data Visualisation
- Introduction to Python for Data Analysis
- Forecasting and Trend Analytics
- Advanced Machine Learning Masterclass
- Advanced Masterclass 2: Random Forests
- Advanced R
- Quantum Computing
- Text and Language Analytics
- Fraud and Anomaly Detection
- Introduction to Machine Learning
- Introduction to Data Science
- Kaggle Boot Camp
By booking this course, you agree to our terms and conditions.
For any enquiries, please call +61 4 1457 3322.
If you prefer, you can pay by invoice rather than credit card. Just select “Pay by invoice” at the checkout.