Blog

What is the use of lineage graph?

What is data lineage in data science?

  • Data Lineage is defined as the life cycle of the data. Data Lineage shows the complete data flow from origin to destination. Data lineage is the process of understanding, documenting, and visualizing the data from its origin to its consumption. This life cycle includes all the transformation done on the dataset from its origin to destination.

Why do we need an RDD lineage graph?

  • The need for an RDD lineage graph happens when we want to compute new RDD or if we want to recover the lost data from the lost persisted RDD. How the RDD lineage graph happens in programmatically:

What is the difference between forward data lineage and backward data lineage?

  • Representation broadly depends on scope of the meta-data management and reference point of interest. Data lineage provides sources of the data and intermediate data flow hops from the reference point with Backward data lineage, leads to the final destination's data points and its intermediate data flows with forward data lineage.

image-What is the use of lineage graph?
image-What is the use of lineage graph?
Share this Post: