Ask AI

You are viewing an unreleased or outdated version of the documentation

Airflow Federation Tutorial#

This tutorial demonstrates using dagster-airlift to observe DAGs from multiple Airflow instances, and federate execution between them using Dagster as a centralized control plane.

Using dagster-airlift we can

  • Observe Airflow DAGs and their execution history
  • Directly trigger Airflow DAGs from Dagster
  • Set up federated execution across Airflow instances

All of this can be done with no changes to Airflow code.

Overview#

This tutorial will take you through an imaginary data platform team that has the following scenario:

  • An Airflow instance warehouse, run by another team, that is responsible for loading data into a data warehouse.
  • An Airflow instance metrics, run by the data platform team, that deploys all the metrics constructed by data scientists on top of the data warehouse.

Two DAGs have been causing a lot of pain lately for the team: warehouse.load_customers and metrics.customer_metrics. The warehouse.load_customers DAG is responsible for loading customer data into the data warehouse, and the metrics.customer_metrics DAG is responsible for computing metrics on top of the customer data. There's a cross-instance dependency relationship between these two DAGs, but it's not observable or controllable. The data platform team would ideally only like to rebuild the metrics.customer_metrics DAG when the warehouse.load_customers DAG has new data. In this guide, we'll use dagster-airlift to observe the warehouse and metrics Airflow instances, and set up a federated execution controlled by Dagster that only triggers the metrics.customer_metrics DAG when the warehouse.load_customers DAG has new data. This process won't require any changes to the Airflow code.

Pages#