ETL is a type of data integration that refers to the three steps (extract, transform, load) used to blend data from multiple sources. It's often used to build a data warehouse. During this process, data is taken (extracted) from a source system, converted (transformed) into a format that can be analyzed, and stored (loaded) into a data warehouse or other system. Extract, load, transform (ELT) is an alternate but related approach designed to push processing down to the database for improved performance.
Businesses have relied on the ETL process for many years to get a consolidated view of the data that drives better business decisions. Today, this method of integrating data from multiple systems and sources is still a core component of an organization’s data integration toolbox.
When used with an enterprise data warehouse (data at rest), ETL provides deep historical context for the business.
By providing a consolidated view, ETL makes it easier for business users to analyze and report on data relevant to their initiatives. ETL can improve data professionals’ productivity because it codifies and reuses processes that move data without requiring technical skills to write code or scripts.
ETL has evolved over time to support emerging integration requirements for things like streaming data. Organizations need both ETL and ELT to bring data together, maintain accuracy and provide the auditing typically required for data warehousing, reporting and analytics.
ETL is used to move and transform data from many different sources and load it into various targets, like Hadoop.