Configuration¶
This document describes the configuration options for the Amtrak Ingestion system.
Environment Variables¶
The following environment variables can be set to configure the application:
Variable |
Required |
Description |
|---|---|---|
|
No |
AWS region for S3 and Lambda (default: us-east-1) |
|
No |
AWS access key (use IAM roles in production) |
|
No |
AWS secret key (use IAM roles in production) |
Provider Configuration¶
Provider enablement is configured in chalicelib/config.py:
# Provider enablement flags
AMTRAK_ENABLED = True
VIA_ENABLED = True
BRIGHTLINE_ENABLED = False
To enable or disable a provider:
Open
chalicelib/config.pySet the corresponding flag to
TrueorFalseRedeploy the application
Constants¶
Key constants are defined in chalicelib/constants.py:
S3 Bucket¶
S3_BUCKET = "amtrak-performance"
The S3 bucket where all data is stored. Change this to use a different bucket.
API Endpoints¶
AMTRAKER_TRAINS_API = "https://api-v3.amtraker.com/v3/trains"
GTFS_URLS = {
Provider.AMTRAK: "https://content.amtrak.com/content/gtfs/GTFS.zip",
Provider.VIA: "https://www.viarail.ca/sites/all/files/gtfs/viarail.zip",
Provider.BRIGHTLINE: "https://www.gobrightline.com/gtfs/gtfs.zip",
}
Timezone¶
EASTERN = ZoneInfo("America/New_York")
The primary timezone for service date calculations.
CSV Fields¶
CSV_FIELDS = [
"service_date",
"route_id",
"trip_id",
"direction_id",
"stop_id",
"stop_sequence",
"vehicle_id",
"vehicle_label",
"event_type",
"event_time",
"scheduled_headway",
"scheduled_tt",
]
Fields written to the collated CSV files.
Logging Configuration¶
Logging is configured in chalicelib/config.py:
def setup_logging():
"""Configure logging for the application."""
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(name)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
Log Levels¶
DEBUG- Detailed debugging informationINFO- General operational informationWARNING- Warning messages for potential issuesERROR- Error messages for failures
Getting a Logger¶
from chalicelib.config import get_logger
logger = get_logger(__name__)
logger.info("Processing started")
Chalice Configuration¶
Chalice configuration is stored in amtraker_ingestion/.chalice/config.json:
{
"version": "2.0",
"app_name": "amtraker-ingestion",
"stages": {
"dev": {
"api_gateway_stage": "api"
}
}
}
Lambda Settings¶
Configure Lambda function settings in the Chalice config:
{
"stages": {
"prod": {
"lambda_memory_size": 512,
"lambda_timeout": 300,
"environment_variables": {
"LOG_LEVEL": "INFO"
}
}
}
}
IAM Policy¶
The IAM policy is defined in .chalice/policy-dev.json:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::amtrak-performance",
"arn:aws:s3:::amtrak-performance/*"
]
}
]
}
Schedule Configuration¶
Scheduled tasks are defined in app.py using Chalice decorators:
@app.schedule(Cron(0, 2, "*", "*", "?", "*"))
def update_gtfs_cache(event):
"""Run daily at 2:00 AM UTC."""
pass
@app.schedule(Rate(5, unit=Rate.MINUTES))
def consume_amtraker_api(event):
"""Run every 5 minutes."""
pass
@app.schedule(Cron(0, 3, "*", "*", "?", "*"))
def collate_previous_day(event):
"""Run daily at 3:00 AM UTC."""
pass
Customizing Schedules¶
Cron expressions follow AWS EventBridge syntax:
Cron(minutes, hours, day_of_month, month, day_of_week, year)
Examples:
Cron(0, 2, "*", "*", "?", "*")- Daily at 2:00 AMCron(30, 8, "*", "*", "?", "*")- Daily at 8:30 AMCron(0, "*/6", "*", "*", "?", "*")- Every 6 hours
Rate expressions:
Rate(5, unit=Rate.MINUTES)- Every 5 minutesRate(1, unit=Rate.HOURS)- Every hourRate(1, unit=Rate.DAYS)- Every day