Skip to main content

How to Create First DAG in Airflow?

Directed Acyclic Graph (DAG) is a group of all individual tasks that we run in an ordered fashion. In other words, we can say that a DAG is a data pipeline in airflow. In a DAG:

  • There is no loop
  • Edges are directed

Key Terminologies:

  • Operator: The task in your DAG is called an operator. In airflow, the nodes of the DAG can be called an operator 
  • Dependencies: The specified relationships between your operators are known as dependencies. In airflow, the directed edges of the DAG can be called dependencies.
  • Tasks: Tasks are units of work in Airflow. Each task can be an operator, a sensor, or a hook.
  • Task Instances: It is a run of a task at a point in time. These are runnable entities. Task Instances belong to a DagRun.

A Dag file is a python file that specifies the structure as well as the code of the DAG. 

Steps To Create an Airflow DAG

  1. Importing the right modules for your DAG
  2. Create default arguments for the DAG 
  3. Creating a DAG Object
  4. Creating tasks
  5. Setting up dependencies for the DAG 

Now, let’s discuss these steps one by one in detail and create a simple DAG.

Step 1: Importing the right modules for your DAG

In order to create a DAG, it is very important to import the right modules that are needed in order to make sure, that we have imported all the modules, that we will be using in our code to create the structure of the DAG. The first and most important module to import is the “DAG” module from the airflow package that will initiate the DAG object for us. Then, we can import the modules related to the date and time. After that we can import the operators, we will be using in our DAG file. Here, we will be just importing the Dummy Operator.

# To initiate the DAG Object
from airflow import DAG
# Importing datetime and timedelta 
 modules for scheduling the DAGs
from datetime import timedelta, datetime
# Importing operators 
from airflow.operators.dummy_operator 
import DummyOperator

Step 2: Create default arguments for the DAG 

Default arguments is a dictionary that we pass to airflow object, it contains the metadata of the DAG. We can easily apply these arguments to as many operators, that we want.

Let’s create a dictionary named default_args 

# Initiating the default_args
default_args = 
        'owner' : 'airflow',
        'start_date' : datetime(2022, 11, 12)
  • the owner can be the owner of the DAG 
  • start_date is the date DAG starts getting scheduled

We can add more such parameters to our arguments, as per our requirement.

Step 3: Creating DAG Object

After the default_args, we have to create a DAG object, by passing a unique identifier, that we call “dag_id“, Here we can name it DAG-1.

So, let’s create a DAG Object.

# Creating DAG Object
dag = DAG(dag_id='DAG-1',
        default_args=default_args,
        schedule_interval='@once', 
        catchup=False
    )

Here, 

  • dag_id is the unique identifier for the DAG.
  • schedule_interval is the time, how frequently our DAG will be triggered. It can be once, hourly, daily, weekly,  monthly, or yearly. None means that we do not want to schedule our DAG and can trigger it manually.
  • catchup – If we want to start executing the task from the current task, then we have to specify the catchup to be False. By default, catchup is True, which means that airflow will start running the tasks for all past intervals up to the current interval by default.  

Step 4: Create tasks

A task is an instance of an operator. It has a unique identifier called task_id. There are various operators, but here, we will be using the DummyOperator. We can create various tasks using various operators. Here we will be creating two simple tasks:-

 # Creating first task
 start = DummyOperator(task_id
  = 'start', dag = dag)

If you go to the graph view in UI, then you can see the task, “start” has been created.

DAG

 

# Creating second task
end = DummyOperator(task_id 
= 'end', dag = dag)

Now, two tasks start and end will be created,

 

Step 5: Setting up dependencies for the DAG.

Dependencies are the relationship between the operators or the order in which the tasks in a DAG will be executed. We can set the order of execution by using the bitwise left or right operators to specify the downstream or upstream fashion respectively.

  • a >> b means that first, a will run, and then b will run. It can also be written as a.set_downstream(b).
  • a << b means that first, b will run which will be followed by a. It can also be written as a.set_upstream(b).

Now, let’s set up the order of execution between the start and end tasks. Here, let us suppose that we want to start to run first, and end running after that.

# Setting up dependencies 
start >> end 
# We can also write it as start.set_downstream(end) 

Now, start and end after setting up dependencies:-

setting dependencies

 

Putting all our code together, 

# Step 1: Importing Modules
# To initiate the DAG Object
from airflow import DAG
# Importing datetime and timedelta modules for scheduling the DAGs
from datetime import timedelta, datetime
# Importing operators 
from airflow.operators.dummy_operator import DummyOperator

# Step 2: Initiating the default_args
default_args = 
        'owner' : 'airflow',
        'start_date' : datetime(2022, 11, 12),



# Step 3: Creating DAG Object
dag = DAG(dag_id='DAG-1',
        default_args=default_args,
        schedule_interval='@once', 
        catchup=False
    )

# Step 4: Creating task
# Creating first task
 start = DummyOperator(task_id = 'start', dag = dag)
# Creating second task 
 end = DummyOperator(task_id = 'end', dag = dag)

 # Step 5: Setting up dependencies 
start >> end 

Now, we have successfully created our first dag. We can move on to the webserver to see it in the UI.

output

 

Now, you can click on the dag and can explore different views of the DAG in the Airflow UI.

Whether you’re preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, neveropen Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we’ve already empowered, and we’re here to do the same for you. Don’t miss out – check it out now!

https://neveropen.tech/how-to-create-first-dag-in-airflow/?feed_id=70&_unique_id=683ce1c0428a9

Comments

Popular posts from this blog

Bare Metal Billing Client Portal Guide

Contents Order a Bare Metal Server My Custom / Contract Pricing View Contract Details Location Management Order History & Status View Order Details Introduction The phoenixNAP Client Portal allows you to purchase bare metal servers and other phoenixNAP products and services. Using the intuitive interface and its essential tools, you can also easily manage your infrastructure. This quick guide will show you how to use the new form to order a bare metal server and how to navigate through new bare metal features within the phoenixNAP Client Portal. Order a Bare Metal Server An order form is an accordion-based process for purchasing phoenixNAP products. Our order form allows you to view the pricing and order multiple products from the same category at the same time. Note: The prices on the form are per month . A contract is not required. However, if you want a contracted price, you may be eligible for a discount depending on the quantity and ...

Add an element in Array to make the bitwise XOR as K

Given an array arr[] containing N positive integers, the task is to add an integer such that the bitwise Xor of the new array becomes K. Examples: Input: arr[] = 1, 4, 5, 6, K = 4 Output: 2 Explanation: Bit-wise XOR of the array is 6.  And bit-wise XOR of 6 and 2 is 4. Input: arr[] = 2, 7, 9, 1, K = 5 Output: 8   Approach: The solution to the problem is based on the following idea of bitwise Xor: If for two numbers X and Y , the bitwise Xor of X and Y is Z then the bitwise Xor of X and Z is Y. Follow the steps to solve the problem: Let the bitwise XOR of the array elements be X .  Say the required value to be added is Y such that X Xor Y = K . From the above observation, it is clear that the value to be added (Y) is the same as X Xor K . Below is the implementation of the above approach: C++ // C++ code to implement the above approach   #include using namespace std;   // Function to find the required value int find_...

Mahotas – Template Matching

In this article we will see how we can do template matching in mahotas. Template is basically a part or structure of image. In this tutorial we will use “lena” image, below is the command to load it.   mahotas.demos.load('lena') Below is the lena image      In order to do this we will use mahotas.template_match method Syntax : mahotas.template_match(img, template) Argument : It takes image object and template as argument Return : It returns image object    Note : Input image should be filtered or should be loaded as grey In order to filter the image we will take the image object which is numpy.ndarray and filter it with the help of indexing, below is the command to do this   image = image[:, :, 0] Below is the implementation    Python3 # importing required libraries import mahotas import mahotas.demos from pylab import gray, imshow, show import numpy as np import matplotlib.pyplot as plt      # loading image ...