Here are some "best practices" that we use at MINDIMENSIONS:
Version Control (Git) for DAGs, Plugins, Scripts, Tests and Configuration
We store all DAGs, plugins, custom operators, tests and Airflow configurations in a Git repository following the structured layout below:
airflow-project/
  ├── dags/               # DAG files
  ├── plugins/            # Custom operators/hooks
  ├── scripts/            # Deployment/helper scripts
  ├── tests/              # Tests
  ├── requirements.txt    # Python dependencies
  ├── Dockerfile          # For containerized deployments
  └── airflow.cfg         # (Optional) Configuration overridesDAG Deployment Isolation from Core Airflow
We decouple DAGs from the Airflow infrastructure. This is done either by using volume mounts through Kubernetes/Docker or cloud storage via (S3/GCS/VMs) to sync DAGs.
Environment Separation
We maintain separate environments:
- Development: local/dev Airflow instances
- Staging: mirrors production, for validation
- Production: stable and monitored
Automated Linting, Formatting and Testing
- We use ruff, black, isortandflake8to lint and formatPythonfiles
- We use pytestto run the unit tests and we mock Airflow dependencies
- We test DAGs execution in a staging environment with sample data
Dependency Management
As any other Python project, we use requirements.txt or pyproject.toml to manage Python dependencies and we pin versions to avoid conflicts ; Airflow is very sensitive to its dependencies.
Secrets and Configuration Management
We know that you know that you should avoid hardcoding secrets in DAGs 😉
We use Airflow Connections stored in metadata DB or in external secrets backend.
And we manage Environment variables via Kubernetes Secrets, AWS Secrets Manager, or any other Provider.
Deployment Strategies
The choice depends on each project. Here are some typical options:
- Git-Syncwhen using Kubernetes to pull DAGs from a Git repo. Here is an example:
gitSync:
    enabled: true
    repo: https://your-server/your-repo/airflow-dags.git
    branch: main- With Cloud Storage, we push DAGs to S3/GCS via CI (e.g., GitHub Actions, GitLab CI), then the other Airflow instances read the DAGs from the bucket:
aws s3 sync ./dags s3://airflow-dags-bucket/A Complete Example of a CI/CD Pipeline using GitHub Actions
name: Airflow CI/CD
on: push
jobs:
  format:
    name: Code format & Lint
    runs-on: ubuntu-latest
    steps:
      - name: Git checkout
        uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - name: Install ruff
        uses: astral-sh/ruff-action@v3
        with:
          version: "0.11.13"
      - name: Check the code formatting
        run: |
          ruff check --output-format=github .
          git diff --exit-code
  test:
    name: Test
    needs: format
    runs-on: ubuntu-latest
    steps:
      - name: Git checkout
        uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - uses: actions/cache@v4
        with:
          path: ~/.cache/pip
          key: pip
          restore-keys: |
            pip
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt
      - name: Run unit tests
        run: pytest
  deploy:
    name: Deploy
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Sync DAGs to S3
        run: aws s3 sync dags/ s3://airflow-dags-bucket/
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_KEY }}Our Favorite CI/CD Pipeline that we use the Most at MINDIMENSIONS
Check the coming Part II for a detailed description.. Stay tuned 😉
