Revision as of 13:02, 27 October 2020

A data lake is a central repository that allows the storage and flow of structured and unstructured data sources. This concept is akin to a lake with multiple streams or sources to fill up a reservoir and store data as is, before it is allowed to flow out to various applications within an organization.

Functions of a Data Lake

Data Ingestion

Tools

Data Storage and Retention

Tools

Data Processing

Tools

Data Access

Tools

Difference from Data Warehouse

A typical data warehouse has its data stored in structured formats where the schema is predetermined. The extract, transform, load, steps are done prior to the storage. A data lake takes data sources in various forms and stores them as is. This allows for cheaper storage and the ETL steps are performed as they are sent to the end user for analysis or dashboards.

Data Swamp

This is when a data lake can become unruly and become a data swamp.

References

https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/

Submitted by Tom Nahass

@@ Line 14: / Line 14: @@
 =Difference from Data Warehouse=
+A typical data warehouse has its data stored in structured formats where the schema is predetermined. The extract, transform, load, steps are done prior to the storage. A data lake takes data sources in various forms and stores them as is. This allows for cheaper storage and the ETL steps are performed as they are sent to the end user for analysis or dashboards.
 =Data Swamp=

Difference between revisions of "Data Lake"

Revision as of 13:02, 27 October 2020

Contents

Functions of a Data Lake

Data Ingestion

Data Storage and Retention

Data Processing

Data Access

Difference from Data Warehouse

Data Swamp

References

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools