## The Question We Keep Getting
"Isn't this just Airflow?"
When people see that AskRobots runs 17 scheduled tasks — from checking your bookmarks for broken links to processing subscription charges — the comparison to Apache Airflow comes up. It's a fair question. Both systems schedule work, track execution history, and provide a dashboard to monitor what's happening.
But the answer is no, and the difference matters.
## What Apache Airflow Is
Airflow was built at Airbnb in 2014 to manage complex data pipelines. Think: "Every night at midnight, pull data from 12 different databases, transform it, load it into a data warehouse, train a machine learning model on the result, and send a report — but only if steps 1-4 all succeeded."
It's a standalone orchestration platform. You deploy it as its own service with:
- A **web server** (the dashboard UI)
- A **scheduler** (decides what to run and when)
- A **metadata database** (tracks state)
- **Workers** (execute the actual tasks)
- Optionally a **message broker** like Redis or RabbitMQ
For a Fortune 500 company running 500 data pipelines, this makes sense. Airflow is the air traffic control tower for complex operations.
## What We Have Instead
Our scheduled tasks are built directly into the Django application. No separate service, no additional infrastructure:
| Component | Airflow | AskRobots |
|---|---|---|
| Scheduler | Separate Airflow process | Celery Beat (already running for the app) |
| Workers | Separate Airflow workers | Celery Workers (already running for the app) |
| Task history | Airflow's metadata DB | AgentRun model (same app database) |
| Dashboard | Separate web application | AI Dashboard (built into the app) |
| Configuration | Python DAG files on disk | Database records (django-celery-beat) |
| Admin UI | Airflow's own web UI | Django admin + user-facing dashboard |
The entire scheduling infrastructure was already there for handling async operations like file processing and AI API calls. We just started using it for automated agents too.
## What Airflow Adds (That We Don't Need Yet)
### DAG Dependencies (Directed Acyclic Graphs)
Airflow's core concept is the DAG — a graph of tasks where Task B waits for Task A to complete before starting. You can build complex trees: "Run data extraction from 5 sources in parallel, then merge results, then run validation, then publish."
Our agents are independent. The link checker doesn't need the QA monitor to finish first. The contact janitor doesn't depend on billing calculations. When we eventually need dependencies (e.g., "run thumbnail generator after link checker finds new links"), we can add simple task chaining in Celery without a full DAG framework.
### Retries with Backoff Strategies
Airflow lets you define sophisticated retry policies: "Retry 3 times with exponential backoff, but only on Tuesdays, and alert Slack after the second failure."
Our agents have simple Celery retries (`max_retries=2, countdown=60`). If the link checker fails, it waits 60 seconds and tries again. That's sufficient when your tasks are small and self-contained. We don't need a policy engine for "check 10 bookmarks."
### SLA Monitoring
Airflow can alert you when a task hasn't completed by its expected deadline. "The daily ETL should finish by 6 AM — if it hasn't, page the on-call engineer."
Our agents complete in seconds (the link checker runs in ~21 seconds). SLA monitoring makes sense when a pipeline takes 4 hours and might silently stall. It doesn't make sense for a task that checks 10 URLs.
### Multi-Tenant Worker Pools
Airflow lets you route tasks to specific worker pools: "GPU tasks go to GPU machines, IO tasks go to IO-optimized machines, and never run more than 5 heavy tasks at once."
We have one Celery worker handling everything. The total compute load of all 17 scheduled tasks is negligible — they're mostly HTTP requests and database queries. Worker pools solve a scaling problem we don't have.
### Kubernetes/Docker Executors
Airflow can spin up a fresh Docker container or Kubernetes pod for every single task execution. This provides perfect isolation — each task gets its own environment, dependencies, and resource limits.
Our tasks run inside the Django process. They share the application's environment, which is fine because they *are* the application. The link checker needs the Link model. The billing tasks need the Subscription model. Isolating them in containers would just add complexity.
## The Real Difference: Platform vs. Tool
Airflow is a tool you bring in to orchestrate work across systems. It connects to external databases, triggers Spark jobs, monitors third-party APIs. It's infrastructure for infrastructure.
Our service agents are part of the product itself. They exist because users saved bookmarks and those bookmarks should stay maintained. The scheduling is incidental — the point is that the link checker *comes with the service*. You don't install it, configure it, or pay extra for it. You save a link, and the platform takes care of it.
This is closer to how Gmail filters spam than how a data team runs ETL. Gmail doesn't need Airflow to filter your email. The spam filter is just what email does.
## When We'd Actually Need Airflow
If we ever need to:
- Chain 10+ tasks with complex dependency graphs
- Run tasks across multiple servers with different capabilities
- Process data pipelines that take hours and might fail halfway through
- Manage hundreds of task definitions with different teams owning different DAGs
Then Airflow (or a similar orchestrator) would make sense. But for running a QA monitor every 6 hours, checking 10 bookmarks hourly, and cleaning up contacts weekly — Celery Beat and a database table is the right tool.
## Current State
17 scheduled tasks running on a single $7/month server:
- **3 service agents** — QA site monitoring, link validation, contact data quality
- **13 billing operations** — subscriptions, storage limits, balance checks, charges
- **1 system task** — Celery housekeeping
All visible in the AI Dashboard. All managed through the database. No separate orchestrator needed.
The goal isn't to replace Airflow. It's to recognize that most applications already have everything they need to run automated maintenance — they just never bother to build the agents.