Saturday, July 27, 2024
HomeWeb developmentHow you can Deploy Apache Airflow on Vultr Utilizing Anaconda — SitePoint

How you can Deploy Apache Airflow on Vultr Utilizing Anaconda — SitePoint


On this article, we’re going to deploy an Airflow software in a Conda surroundings and safe the appliance utilizing Nginx and request SSL certificates from Let’s Encrypt.

Airflow is a well-liked instrument that we are able to use to outline, schedule, and monitor our complicated workflows. We are able to create Directed Acyclic Graphs (DAGs) to automate duties throughout our work platforms, and being open-source, Airflow has a neighborhood to supply assist and enhance constantly.

It is a sponsored article by Vultr. Vultr is the world’s largest privately-held cloud computing platform. A favourite with builders, Vultr has served over 1.5 million clients throughout 185 nations with versatile, scalable, world Cloud Compute, Cloud GPU, Naked Steel, and Cloud Storage options. Be taught extra about Vultr.

Deploying a Server on Vultr

Let’s begin by deploying a Vultr server with the Anaconda market software.

  1. Join and log in to the Vultr Buyer Portal.

  2. Navigate to the Merchandise web page.

  3. Choose Compute from the facet menu.

  4. Click on Deploy Server.

  5. Choose Cloud Compute because the server sort.

  6. Select a Location.

  7. Choose Anaconda amongst market functions.

    Vultr Anaconda marketplace app selection

  8. Select a Plan.

  9. Choose any extra options as required within the “Further Options” part.

  10. Click on the Deploy Now button.

    Vultr server deploy button

Making a Vultr Managed Database

After deploying a Vultr server, we’ll subsequent deploy a Vultr-managed PostgreSQL Database. We’ll additionally create two new databases in our database occasion that will likely be used to attach with our Airflow software later within the weblog.

  1. Open the Vultr Buyer Portal.

  2. Click on the Merchandise menu group and navigate to Databases to create a PostgreSQL managed database.

    Vultr Database products menu button

  3. Click on Add Managed Databases.

  4. Choose PostgreSQL with the newest model because the database engine.

    Vultr managed PostgreSQL selection

  5. Choose Server Configuration and Server Location.

  6. Write a Label for the service.

    Label button managed database

  7. Click on Deploy Now.

    Vultr managed database deploy button

  8. After the database is deployed, choose Customers & Databases.

    Vultr managed database users and database section

  9. Click on Add New Database.

  10. Kind in a reputation, click on Add Database and title it airflow-pgsql.

  11. Repeat steps 9 and 10 so as to add one other database in the identical managed database and title it airflow-celery.

Getting Began with Conda and Airflow

Now that we’ve created a Vultr-managed PostgreSQL occasion, we’ll use the Vultr server to create a Conda surroundings and set up the required dependencies.

  1. Test for the Conda model:

  2. Create a Conda surroundings:

    $ conda create -n airflow python=3.8
  3. Activate the surroundings:

  4. Set up Redis server:

    (airflow) $ apt set up -y redis-server
  5. Allow the Redis server:

    (airflow) $ sudo systemctl allow redis-server
  6. Test the standing:

    (airflow) $ sudo systemctl standing redis-server

    Redis server status check

  7. Set up the Python bundle supervisor:

    (airflow) $ conda set up pip
  8. Set up the required dependencies:

    (airflow) $ pip set up psycopg2-binary virtualenv redis
  9. Set up Airflow within the Conda surroundings:

    (airflow) $ pip set up "apache-airflow[celery]==2.8.1" --constraint "https://uncooked.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.8.txt"

Connecting Airflow with Vultr Managed Database

After getting ready the surroundings, now let’s join our Airflow software with the 2 databases we created earlier inside our database occasion and make essential modifications to the Airflow configuration to make our software production-ready.

  1. Set surroundings variable for database connection:

    (airflow) $ export AIRFLOW__DATABASE__SQL_ALCHEMY_CONN="postgresql://person:password@hostname:port/db_name"

    Make sure that to exchange the person, password, hostname, and port with the precise values within the connection particulars part by choosing the airflow-pgsql database. Change the db_name with airflow-pgsql.

    airflow-pgsql database credential selection

  2. Initialize the metadata database.

    We should initialize a metadata database for Airflow to create essential tables and schema that shops data like DAGs and data associated to our workflows:

    (airflow) $ airflow db init
  3. Open the Airflow configuration file:

    (airflow) $ sudo nano ~/airflow/airflow.cfg
  4. Scroll down and alter the executor:

    executor = CeleryExecutor
  5. Hyperlink the Vultr-managed PostgreSQL database, and alter the worth of sql_alchemy_conn:

    sql_alchemy_conn = "postgresql://person:password@hostname:port/db_name"

    Make sure that to exchange the person, password, hostname, and port with the precise values within the connection particulars part by choosing the airflow-pgsql database. Change the db_name with airflow-pgsql.

  6. Scroll down and alter the employee and set off log ports:

    worker_log_server_port = 8794
    trigger_log_server_port = 8795
  7. Change the broker_url:

    broker_url = redis://localhost:6379/0
  8. Take away the # and alter the result_backend:

    result_backend = db+postgresql://person:password@hostname:port/db_name

    Make sure that to exchange the person, password, hostname, and port with the precise values within the connection particulars part by choosing the airflow-celery database. Change the db_name with airflow-celery.

    airflow-celery database credential selection

  9. Save and exit the file.

  10. Create an Airflow person:

    (airflow) $ airflow customers create n    --username admin n    --firstname Peter n    --lastname Parker n    --role Admin n    --email spiderman@superhero.org

    Make sure that to exchange all of the variable values with the precise values.

    Enter a password when prompted to set it for the person whereas accessing the dashboard.

Daemonizing the Airflow Software

Now let’s daemonize our Airflow software in order that it runs within the background and continues to run independently even once we shut the terminal and sign off.

These steps may even assist us to create a persistent service for the Airflow webserver, scheduler, and celery staff.

  1. View the airflow path:

    (airflow) $ which airflow

    Copy and paste the trail into the clipboard.

  2. Create an Airflow webserver service file:

    (airflow) $ sudo nano /and so on/systemd/system/airflow-webserver.service
  3. Paste the service configurations within the file.

    airflow webserver is chargeable for offering a web-based person interface that may permit us to work together and handle our workflows. These configurations will make a background working service for our Airflow webserver:

    [Unit]
    Description="Airflow Webserver"
    After=community.goal
    
    [Service]
    Consumer=example_user
    Group=example_user
    ExecStart=/house/example_user/.native/bin/airflow webserver
    
    [Install]
    WantedBy=multi-user.goal

    Make sure that to exchange Consumer and Group together with your precise non-root sudo person account particulars, and change the ExecStart path with the precise Airflow path together with the executable binary we copied earlier within the clipboard.

  4. Save and shut the file.

  5. Allow the airflow-webserver service, in order that the webserver routinely begins up through the system boot course of:

    (airflow) $ systemctl allow airflow-webserver
  6. Begin the service:

    (airflow) $ sudo systemctl begin airflow-webserver
  7. Be sure that the service is up and working:

    (airflow) $ sudo systemctl standing airflow-webserver

    Our output ought to seem just like the one pictured beneath.

    airflow-webserver service status check

  8. Create an Airflow Celery service file:

    (airflow) $ sudo nano /and so on/systemd/system/airflow-celery.service
  9. Paste the service configurations within the file.

    airflow celery employee begins a Celery employee. Celery is a distributed activity queue that may permit us to distribute and execute duties throughout a number of staff. The employees connect with our Redis server to obtain and execute duties:

    [Unit]
    Description="Airflow Celery"
    After=community.goal
    
    [Service]
    Consumer=example_user
    Group=example_user
    ExecStart=/house/example_user/.native/bin/airflow celery employee
    
    [Install]
    WantedBy=multi-user.goal

    Make sure that to exchange Consumer and Group together with your precise non-root sudo person account particulars, and change the ExecStart path with the precise Airflow path together with the executable binary we copied earlier within the clipboard.

  10. Save and shut the file.

  11. Allow the airflow-celery service:

    (airflow) $ sudo systemctl allow airflow-celery
  12. Begin the service:

    (airflow) $ sudo systemctl begin airflow-celery
  13. Be sure that the service is up and working:

    (airflow) $ sudo systemctl standing airflow-celery
  14. Create an Airflow scheduler service file:

    (airflow) $ sudo nano /and so on/systemd/system/airflow-scheduler.service
  15. Paste the service configurations within the file.

    airflow scheduler is chargeable for scheduling and triggering the DAGs and the duties outlined in them. It additionally checks the standing of DAGs and duties periodically:

    [Unit]
    Description="Airflow Scheduler"
    After=community.goal
    
    [Service]
    Consumer=example_user
    Group=example_user
    ExecStart=/house/example_user/.native/bin/airflow scheduler
    
    [Install]
    WantedBy=multi-user.goal

    Make sure that to exchange Consumer and Group together with your precise non-root sudo person account particulars, and change the ExecStart path with the precise Airflow path together with the executable binary we copied earlier within the clipboard.

  16. Save and shut the file.

  17. Allow the airflow-scheduler service:

    (airflow) $ sudo systemctl allow airflow-scheduler
  18. Begin the service:

    (airflow) $ sudo systemctl begin airflow-scheduler
  19. Be sure that the service is up and working:

    (airflow) $ sudo systemctl standing airflow-scheduler

    Our output ought to seem like that pictured beneath.

    airflow-scheduler service status check

Organising Nginx as a Reverse Proxy

We’ve created persistent providers for the Airflow software, so now we’ll arrange Nginx as a reverse proxy to boost our software’s safety and scalability following the steps outlined beneath.

  1. Log in to the Vultr Buyer Portal.

  2. Navigate to the Merchandise web page.

  3. From the facet menu, broaden the Community drop down, and choose DNS.

  4. Click on the Add Area button within the heart.

  5. Comply with the setup process so as to add your area title by choosing the IP tackle of your server.

  6. Set the next hostnames as your area’s main and secondary nameservers together with your area registrar:

    • ns1.vultr.com
    • ns2.vultr.com
  7. Set up Nginx:

    (airflow) $ apt set up nginx
  8. Make sure that to verify if the Nginx server is up and working:

    (airflow) $ sudo systemctl standing nginx
  9. Create a brand new Nginx digital host configuration file within the sites-available listing:

    (airflow) $ sudo nano /and so on/nginx/sites-available/airflow.conf
  10. Add the configurations to the file.

    These configurations will direct the visitors on our software from the precise area to the backend server at http://127.0.0.1:8080 utilizing a proxy move:

    server {
    
        pay attention 80;
        pay attention [::]:80;
        server_name airflow.instance.com;
    
        location / {
            proxy_pass http://127.0.0.1:8080;  
        }
    
    }

    Make sure that to exchange airflow.instance.com with the precise area we added within the Vultr dashboard.

  11. Save and shut the file.

  12. Hyperlink the configuration file to the sites-enabled listing to activate the configuration file:

    (airflow) $ sudo ln -s /and so on/nginx/sites-available/airflow.conf /and so on/nginx/sites-enabled/
  13. Make sure that to verify the configuration for errors:

    (airflow) $ sudo nginx -t

    Our output ought to seem like that pictured beneath.

    nginx configuration check

  14. Restart Nginx to use modifications:

    (airflow) $ sudo systemctl reload nginx
  15. Permit the HTTP port 80 by the firewall for all of the incoming connections:

    (airflow) $ sudo ufw permit 80/tcp
  16. Permit the HTTPS port 443 by the firewall for all incoming connections:

    (airflow) $ sudo ufw permit 443/tcp
  17. Reload firewall guidelines to avoid wasting modifications:

    (airflow) $ sudo ufw reload

Making use of Let’s Encrypt SSL Certificates to the Airflow Software

The final step is to use a Let’s Encrypt SSL Certificates to our Airflow software in order that it turns into far more safe and saves our software from undesirable assaults.

  1. Utilizing Snap, set up the Certbot Let’s Encrypt consumer:

    (airflow) $ snap set up --classic certbot
  2. Get a brand new SSL certificates for our area:

    (airflow) $ certbot --nginx -d airflow.instance.com

    Make sure that to exchange airflow.instance.com with our precise area title.
    And when prompted enter an e-mail tackle and press Y to simply accept the Let’s Encrypt phrases.

  3. Take a look at that the SSL certificates auto-renews upon expiry.

    Auto-renewal makes positive our SSL certificates are updated, decreasing the danger of certificates expiry and sustaining the safety of our software:

    (airflow) $ certbot renew --dry-run
  4. Use an online browser to open our Airflow software: https://airflow.instance.com.

    When prompted, enter the username and password we created earlier.

    airflow dashboard login

    Upon accessing the dashboard, all of the DAGs will likely be seen which might be offered by default.

    airflow dashboard

Conclusion

On this article, we demonstrated create Conda environments, deploy a production-ready Airflow software, and enhance the efficiency and safety of an software.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments