Case Study
Lineup
Concert discovery iOS app with production data platform
Ingesting 1M+ events from 6 APIs via Apache Airflow into Supabase, with BigQuery analytics and dbt transformations.
Overview
Lineup is a concert discovery iOS app paired with a production data platform. The platform orchestrates daily syncs from six API sources (Ticketmaster, Skiddle, Resident Advisor, Concert Archives, Spotify, MusicBrainz) using Apache Airflow, landing data into Supabase as the operational database and GCS/BigQuery as the analytics layer. The stack spans 1M+ events and 300K+ artists with dbt transformations and Looker Studio dashboards.
Problem
Concert discovery is fragmented. No single platform aggregates events across the full breadth of the live music calendar. The app solves this by consolidating events globally, with iOS integration for easy tracking and sharing.
Key Decisions and Trade-offs
- Airflow for orchestration. Chose Apache Airflow with CeleryExecutor and Redis caching for robust, observable scheduling. Alternative was Lambda-based serverless, but Airflow gives better visibility and control.
- Multi-source deduplication. Three-step matching logic: source ID to MusicBrainz ID first, fuzzy name/location matching for what doesn't match, then an
ExternalIdManagerclass holding persistent cross-source mappings in dedicated tables. Handles conflicts across Ticketmaster, Skiddle, Resident Advisor, Concert Archives, Spotify, and MusicBrainz. Trade-off is complexity; payoff is high data quality. - Write-through pattern. Simultaneous writes to Supabase (operational) and GCS Parquet (analytics) to avoid coupling. Slight latency cost; the two systems stay independent.
- BigQuery warehouse. Chosen over a traditional data lake for cost efficiency and SQL ergonomics. dbt layered architecture (staging, intermediate, marts) keeps transforms maintainable.
- Redis over a 57MB JSON cache. Replaced an in-repo JSON cache with Redis 7.x as a distributed cache for Celery workers. Smaller worker memory footprint, faster warm starts, and proper eviction semantics.
Stack and Why
| Layer | Technology | Rationale |
|---|---|---|
| Orchestration | Apache Airflow 2.10 | Observable, scalable DAG scheduling with retry logic and alerting. |
| Operational DB | Supabase (PostgreSQL) | RLS for row-level security, real-time subscriptions, low operational overhead. |
| Analytics Warehouse | BigQuery | Serverless, cost-efficient for analytic queries, integrates with dbt natively. |
| Transformations | dbt | Version-controlled, testable SQL transforms with layered architecture. |
| Frontend | Swift/SwiftUI, iOS 26 | Native iOS performance and Apple ecosystem integration. |
| Infrastructure | Docker Compose, GitHub Actions | Local reproducibility, CI/CD for testing and deployment. |
What Shipped
- Airflow DAGs: 10 production DAGs orchestrating daily syncs from 6 API sources with source-specific selectors. Retry config: 3 retries with exponential backoff, plus Slack and email alerting on failure.
- Modular Python package. Pipeline refactored into an installable package (
src/lineup_pipeline/) with proper module structure, dependency management, and CI validation so changes don't silently break downstream jobs. - BigQuery warehouse: dbt project with staging, intermediate, and mart layers. Data quality tests enforcing uniqueness and referential integrity.
- Entity deduplication: Three-step matching logic handling conflicts across sources.
ExternalIdManagerclass managing all mappings with dedicated persistence tables. - iOS app: Swift/SwiftUI frontend with event browsing, tracking, and sharing.
- Analytics dashboards: Looker Studio dashboards for pipeline health, event distribution, and data quality metrics.
- Testing and CI/CD: 122 unit tests with GitHub Actions for automated testing on every commit.
Metrics
- 1M+ events across all sources
- 300K+ unique artists
- 6 active API integrations
- 122 unit tests covering pipeline logic
- 122 dbt data quality tests
- Zero production data losses in 12 months of operation
What Is Next
Incremental loads to cut BigQuery costs. Analytics expansion: attribution modelling for venue popularity trends and artist discovery patterns (which events and artists drive ticket sales, where emerging genres surface). Integration roadmap: Live Nation API for major promoter events, then regional promoters for deeper market coverage.
Learn more: GitHub repository