Course outline

Data Science in Practice

Categories: Guaranteed To Run™, Pivotal


Duration: 5 Days

This course is designed to give the participant hands-on experience nwith the Pivotal products related to performing Pivotal Data Science projects. Given the diverse and varying nature of ncustomer implementations this course will focus on the main naspects of a Data Science project within Pivotal: Pivotal nGreenplum DB pSQL MADlib GPText PivotalHD HAWQ nPivotalR pyMADlib with extra units covering Alpine Chorus and nVisualization. Participants are introduced to the need for big fast ndata and its role in modern business applications; The course will nprovide hands on experience using Pivotal Greenplum DB pSQL nMADlib GPText Apache Hadoop Pivotal HD HAWQ Alpine nChorus PivotalR PL/R pyMADlib PL/Python and several nvisualization tools such as Gephi D3 and Tableau. This course will nintroduce and use but does not include extensive training on npSQL R Python. Further this course will provide attendees with nan opportunity to explore several intense Data Science projects nthat have been converted into extensive Data Science exercises. nThis course does not teach Installation Configuration and nManagement of any of the products

At the end of Data Science in Practice training course, participants will be able to:

  • Summarize the distinguishing characteristics of each Pivotal product and tool, and be able to describe the most beneficial naspects from a Data Science perspective;
  • Evaluate and demonstrate hands-on practical skills with each nproduct and tool;
  • Investigate, assess, and apply their knowledge to practical data nscience problems;
  • Practice Data Science problem solving techniques to their nrespective endeavors.

As a result of attending the course, the Data Scientist will be able nto confidently utilize the Pivotal product set and related technologies to analyze large data sets.

  • Willingness to participate in a demanding, high-intensity training experience.
  • Comfort with data analytic technologies a plus n(Statistics, mathematics, machine learning, SQL, R, Python)
  • Have a basic understanding of virtualization and nmassive parallel processing concepts.
  • Experienced data analysts and data engineers willing nto work hard to achieve superior Pivotal Data Science nskills.
  • Anyone else who wants to learn about data science nusing the Pivotal product stack.
  1. Introduction
  1. Data science overview
    • Data Science: The Big Picture
    • Driving Forces
    • What Does a Data Scientist Do
    • The Process of Data Science
    • What Does Pivotal bring to the Story
    1. Pivotal overview
    • Pivotal Corporate Overview
    • The Pivotal Big Data Suite - Pivotal Greenplum DB - tPivotal GPText - tMADlib - tPivotal HD - tPivotal on Virtualized Hardware - tPivotal HAWQ - tPivotal eXtension Framework (PXF) - tPivotal Analytics Workbench - tPivotal GemFire - tPivotal GemFireXD - tSpring by Pivotal - tSpring XD - tPivotal Labs and Pivotal Data Labs –
    1. Pivotal greenplum DB review including inline labs
    • Essentials
    • Getting Started and Inline Lab Exercise
    • Intro to pSQL and Inline Lab Exercises - Creating Tables - tDistributions and Partitioning - tIndexes - tExternal Tables and Loading Data -
    • Unloading Data
    • Analyze
    • Explain and Analyze
    • Vacuum
    • Monitoring
    1. Advanced SQL
    • Explore and Inline Lab Exercise
    • Joins and Inline Lab Exercise
    • Arrays and Array Aggregates and Inline Lab Exercise
    • Window Functions and Inline Lab Exercise
    • Other Functions and Inline Lab Exercise
    • User Defined Functions (UDF's)
    • User Defined Aggregates (UDA's)
    • Data Science Exercise
    1. MADLIB including inline labs
    • MADlib Basics
    • Advanced MADlib
    • Data Science Exercise
    1. TEXT including inline labs
    • NLP: Practical Examples
    • NLP: Practical Examples with NLTK
    • Putting it all together
    • Data Science Exercise
      Apache hadoop and the hadoop ecosystem including inline labs
    • Apache Hadoop Overview
    • - Core Component: HDFS
    • - Core Component: MapReduce
    • - Map Reduce: Writing a Job
    • Hadoop Ecosystem
    • - Hadoop Streaming
    • - Pig
      Pivotal HD and HAWQ including inline labs
    • Intro to Pivotal HD and HAWQ
    • Getting Started with HAWQ
    • Working with HAWQ
    • External Tables: file, gpfdist, web
    • External Tables: PXF
    • Loading and Unloading Data and Inline Lab Exercises
    • Loading and Unloading using Copy
    • Loading and Unloading using Insert
    • Loading and Unloading using gpfdist / gpload / external tables
    • Data Science Exercise
      Gemfire (optional)
    • Gemfire
      R and python
    • PivotalR
    • PL/R
    • pyMADlib
    • PL/Python
    • Data Science Exercise
    • Tableau
    • R
    • Python
    • Exercises
      HAWQ text analytics exercise airline price optimization exercise gene sequencing
    • HAWQ Text Analytics Exercise
    • Airline Price Optimization Exercise
    • Gene Sequencing Exercise

Feel free to contact us, if you want to know the price and location of this course. A Digital Revolver representative will contact you shortly to help you with your inquiry.
Please fill out the form below

  • Guaranteed to Run™. This ensures you will attend the instructor led class or live online class you want as scheduled without any disruptive cancellations*. You book the training you need, get back to focusing on your job and are sure your training requirements will be met saving time, money and ensuring peace of mind.
  • This schedule icon the schedule indicates that this date/time will be conducted as Instructor Led Training (ILT) or a Virtual Instructor Led Training (VILT) depending on the indicated class availablity.
Privacy and Cookies

This website stores cookies on your computer which help us make the website work better for you.

Learn moreAccept and Close
Social media & sharing icons powered by UltimatelySocial