ETL and Data Processing Pipelines Using Shell, Airflow and Kafka
Learn about two different approaches to transforming raw data into data ready for analysis. One approach is the Extract, Transform, Load (ETL) process. The other, opposite approach is the Extract, Load, and Transform (ELT) process. ETL processes apply to data warehouses and data maps. ELT processes apply to data lakes, where data is transformed on demand by the requesting/calling application. In this course, you will learn about the various tools and techniques used in ETL and Data Pipelines. Both ETL and ELT extract data from source systems, move the data through a data pipeline, and store it in destination systems. In this course, you will learn how ELT and ETL data processing differ and identify use cases for both techniques. You will identify the techniques and tools used to extract data, combine the extracted data logically or physically, and load the data into data warehouses. You will also determine what transformations should be applied to the source data to make it authoritative, contextual, and accessible to data users. You will be able to describe some of the many techniques for loading data into a destination system, checking data quality, monitoring for loading failures, and using recovery mechanisms in the event of a failure. By the end of this course, you will also know how to use Apache Airflow to build data pipelines and be aware of the benefits of using this approach. You will also learn how to use Apache Kafka to build streaming data pipelines, as well as the main components of Kafka, which include brokers, topics, partitions, replications, producers, and consumers. Finally, you will complete a capstone project that will allow you to demonstrate the skills you have learned in each module.