Amtrak Ingestion Documentation¶

Welcome to the Amtrak Ingestion documentation. This project is a serverless data pipeline built on AWS Chalice that ingests, enriches, and processes real-time train data from multiple transit providers (Amtrak, VIA Rail, and Brightline).

Features¶

Real-time train data ingestion from Amtraker API
GTFS data enrichment for scheduled metrics
Multi-provider support (Amtrak, VIA Rail, Brightline)
Automated event generation for arrivals and departures
AWS S3 storage with efficient compression
Daily data collation and aggregation
Serverless architecture using AWS Lambda

Quick Start¶

Install dependencies:
```
pip install -e ".[dev]"
```
Configure AWS credentials and environment variables
Deploy to AWS:
```
cd amtraker_ingestion
chalice deploy
```
The pipeline will automatically:
- Update GTFS cache daily at 2:00 AM UTC
- Consume Amtraker API every 5 minutes
- Collate previous day’s data at 3:00 AM UTC

Amtrak Ingestion Documentation¶

Features¶

Quick Start¶

Indices and tables¶

Amtrak Ingestion

Navigation

Related Topics