Difference between revisions of "Data Lake"
(→Difference from Data Warehouse) |
|||
Line 14: | Line 14: | ||
=Difference from Data Warehouse= | =Difference from Data Warehouse= | ||
− | + | A typical data warehouse has its data stored in structured formats where the schema is predetermined. The extract, transform, load, steps are done prior to the storage. A data lake takes data sources in various forms and stores them as is. This allows for cheaper storage and the ETL steps are performed as they are sent to the end user for analysis or dashboards. | |
=Data Swamp= | =Data Swamp= |
Revision as of 13:02, 27 October 2020
A data lake is a central repository that allows the storage and flow of structured and unstructured data sources. This concept is akin to a lake with multiple streams or sources to fill up a reservoir and store data as is, before it is allowed to flow out to various applications within an organization.
Contents
Functions of a Data Lake
Data Ingestion
- Tools
Data Storage and Retention
- Tools
Data Processing
- Tools
Data Access
- Tools
Difference from Data Warehouse
A typical data warehouse has its data stored in structured formats where the schema is predetermined. The extract, transform, load, steps are done prior to the storage. A data lake takes data sources in various forms and stores them as is. This allows for cheaper storage and the ETL steps are performed as they are sent to the end user for analysis or dashboards.
Data Swamp
This is when a data lake can become unruly and become a data swamp.
References
https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
Submitted by Tom Nahass