Amtrak Ingestion Documentation

Welcome to the Amtrak Ingestion documentation. This project is a serverless data pipeline built on AWS Chalice that ingests, enriches, and processes real-time train data from multiple transit providers (Amtrak, VIA Rail, and Brightline).

Features

  • Real-time train data ingestion from Amtraker API

  • GTFS data enrichment for scheduled metrics

  • Multi-provider support (Amtrak, VIA Rail, Brightline)

  • Automated event generation for arrivals and departures

  • AWS S3 storage with efficient compression

  • Daily data collation and aggregation

  • Serverless architecture using AWS Lambda

Quick Start

  1. Install dependencies:

    pip install -e ".[dev]"
    
  2. Configure AWS credentials and environment variables

  3. Deploy to AWS:

    cd amtraker_ingestion
    chalice deploy
    
  4. The pipeline will automatically:

    • Update GTFS cache daily at 2:00 AM UTC

    • Consume Amtraker API every 5 minutes

    • Collate previous day’s data at 3:00 AM UTC

Indices and tables