Here are some "best practices" that we use at MINDIMENSIONS:
Version Control (Git) for DAGs, Plugins, Scripts, Tests and Configuration
We store all DAGs, plugins, custom operators, tests and Airflow configurations in a Git repository following the structured layout below:
airflow-project/
├── dags/ # DAG files
├── plugins/ # Custom operators/hooks
├── scripts/ # Deployment/helper scripts
├── tests/ # Tests
├── requirements.txt # Python dependencies
├── Dockerfile # For containerized deployments
└── airflow.cfg # (Optional) Configuration overrides
DAG Deployment Isolation from Core Airflow
We decouple DAGs from the Airflow infrastructure. This is done either by using volume mounts through Kubernetes/Docker or cloud storage via (S3/GCS/VMs) to sync DAGs.
Environment Separation
We maintain separate environments:
- Development: local/dev Airflow instances
- Staging: mirrors production, for validation
- Production: stable and monitored
Automated Linting, Formatting and Testing
- We use
ruff, black, isort
andflake8
to lint and formatPython
files - We use
pytest
to run the unit tests and we mock Airflow dependencies - We test DAGs execution in a staging environment with sample data
Dependency Management
As any other Python
project, we use requirements.txt
or pyproject.toml
to manage Python
dependencies and we pin versions to avoid conflicts ; Airflow is very sensitive to its dependencies.
Secrets and Configuration Management
We know that you know that you should avoid hardcoding secrets in DAGs 😉
We use Airflow Connections stored in metadata DB or in external secrets backend.
And we manage Environment variables via Kubernetes Secrets, AWS Secrets Manager, or any other Provider.
Deployment Strategies
The choice depends on each project. Here are some typical options:
Git-Sync
when using Kubernetes to pull DAGs from a Git repo. Here is an example:
gitSync:
enabled: true
repo: https://your-server/your-repo/airflow-dags.git
branch: main
- With Cloud Storage, we push DAGs to S3/GCS via CI (e.g., GitHub Actions, GitLab CI), then the other Airflow instances read the DAGs from the bucket:
aws s3 sync ./dags s3://airflow-dags-bucket/
A Complete Example of a CI/CD Pipeline using GitHub Actions
name: Airflow CI/CD
on: push
jobs:
format:
name: Code format & Lint
runs-on: ubuntu-latest
steps:
- name: Git checkout
uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install ruff
uses: astral-sh/ruff-action@v3
with:
version: "0.11.13"
- name: Check the code formatting
run: |
ruff check --output-format=github .
git diff --exit-code
test:
name: Test
needs: format
runs-on: ubuntu-latest
steps:
- name: Git checkout
uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- uses: actions/cache@v4
with:
path: ~/.cache/pip
key: pip
restore-keys: |
pip
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run unit tests
run: pytest
deploy:
name: Deploy
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Sync DAGs to S3
run: aws s3 sync dags/ s3://airflow-dags-bucket/
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY }}
Our Favorite CI/CD Pipeline that we use the Most at MINDIMENSIONS
Check the coming Part II for a detailed description.. Stay tuned 😉