Amtrak Ingestion Documentation¶
Welcome to the Amtrak Ingestion documentation. This project is a serverless data pipeline built on AWS Chalice that ingests, enriches, and processes real-time train data from multiple transit providers (Amtrak, VIA Rail, and Brightline).
Contents:
Features¶
Real-time train data ingestion from Amtraker API
GTFS data enrichment for scheduled metrics
Multi-provider support (Amtrak, VIA Rail, Brightline)
Automated event generation for arrivals and departures
AWS S3 storage with efficient compression
Daily data collation and aggregation
Serverless architecture using AWS Lambda
Quick Start¶
Install dependencies:
pip install -e ".[dev]"
Configure AWS credentials and environment variables
Deploy to AWS:
cd amtraker_ingestion chalice deploy
The pipeline will automatically:
Update GTFS cache daily at 2:00 AM UTC
Consume Amtraker API every 5 minutes
Collate previous day’s data at 3:00 AM UTC