GTFS - General Transit Feed Specification
Introduction to GTFS
GTFS stands for General Transit Feed Specification. It's a standardised data format to describe a public transportation network, such as a country's railway or a city's bus system.
Originally, it only covered the static or scheduled part of a network, such as the geolocation of stops, the sequence of stops that make up a route, timetables, fares, etc. The specification was then extended to real time updates (including vehicle delays, canceled trips, and more) and to flexible, on-demand services.
The former is now commonly referred to as simply GTFS, GTFS Schedule, or GTFS Static. The two extensions are respectively named GTFS Realtime and GTFS Flex.
The point of it all is to provide a standard, documented, consistent method for sharing public transportation data between the agencies providing this service, and the software developers who consume the data to create a wide range of tools such as itinerary planners and displays for travelers.
Let's skip over the boring part of who started it (Google), when (2005) and why (for Google Maps to support itineraries using public transportation) -- you can read all about this on Wikipedia. What we care about is what GTFS is, practically speaking.
What is GTFS Schedule?
It's a dataset in the form of a ZIP file containing a series of TXT files, that are actually CSV files, which represent a relational database. That's a lot of layers! Let's peel them off one by one.
- a ZIP file...
Download Switzerland's GTFS file to see what it looks like. It's a normal ZIP (compressed) file.
- ...containing a series of TXT files...
Unzip the downloaded file. It contains a bunch of TXT files such as stops.txt
, routes.txt
, agency.txt
, etc. Each file describes a pillar of the transit network. All supported files are listed in the official documentation.
- ...that are actually CSV files...
Open the agency.txt
file, it looks like this:
agency_id,agency_name,agency_url,agency_timezone,agency_lang,agency_phone
"11","Schweizerische Bundesbahnen SBB","http://www.sbb.ch/","Europe/Berlin","DE","0848 44 66 88"
"87_LEX","Société Nationale des Chemins de fer Français","http://www.sbb.ch/","Europe/Berlin","DE","0848 44 66 88"
The first row is a list of comma-separated headers. The following rows are lists of comma-separated values.
- ...which represent a relational database.
The ZIP file represents a database. Each file is a table in that database. The file's headers are the table's columns, and the following rows are the table's records.
For example, the downloaded Swiss file represents a GTFS database, the agency.txt
is the agency
table, and the columns and values are those illustrated above.
Notice how some headers have obvious types and roles. Like agency_id
is clearly meant to be used as a primary key on the agency
table, and a foreign key in others.
Here's an illustration of the core GTFS tables, columns, and relationships:
erDiagram "agency (required)" { Primary_key agency_id "Required*" Text agency_name "Required" URL agency_url "Required" Timezone agency_timezone "Required" Language_code agency_lang "Optional" Phone_number agency_phone "Optional" URL agency_fare_url "Optional" Email agency_email "Optional" } "stops (required)" { Primary_key stop_id "Required" Foreign_key parent_station "Required*" Foreign_key level_id "Optional" Text stop_code "Optional" Text stop_name "Required*" Text tts_stop_name "Optional" Text stop_desc "Optional" Latitude stop_lat "Required*" Longitude stop_lon "Required*" Id zone_id "Optional" URL stop_url "Optional" Enum location_type "Optional" Timezone stop_timezone "Optional" Enum wheelchair_boarding "Optional" Text platform_code "Optional" } "levels (optional)" { Primary_key level_id "Required" Float level_index "Required" Text level_name "Optional" } "pathways (optional)" { Primary_key pathway_id "Required" Foreign_key from_stop_id "Required" Foreign_key to_stop_id "Required" Enum pathway_mode "Required" Enum is_bidirectional "Required" Float length "Optional" Integer traversal_time "Optional" Integer stair_count "Optional" Float max_slope "Optional" Float min_width "Optional" Text signposted_as "Optional" Text reversed_signposted_as "Optional" } "routes (required)" { Primary_key route_id "Required" Foreign_key agency_id "Required*" Text route_short_name "Required*" Text route_long_name "Required*" Text route_desc "Optional" Enum route_type "Required" URL route_url "Optional" Colour route_color "Optional" Colour route_text_color "Optional" Integer route_sort_order "Optional" Enum continuous_pickup "Optional*" Enum continuous_drop_off "Optional*" Id network_id "Optional*" } "trips (required)" { Primary_key trip_id "Required" Foreign_key route_id "Required" Foreign_key service_id "Required" Foreign_key shape_id "Optional*" Text trip_headsign "Optional" Text trip_short_name "Optional" Enum direction_id "Optional" Id block_id "Optional" Enum wheelchair_accessible "Optional" Enum bikes_allowed "Optional" } "stop_times (required)" { Foreign_key trip_id "Required" Foreign_key stop_id "Required*" Foreign_key location_id "Optional*" Foreign_key pickup_booking_rule_id "Optional" Foreign_key drop_off_booking_rule_id "Optional" Time arrival_time "Required*" Time departure_time "Required*" Integer stop_sequence "Required" Text stop_headsign "Optional" Time start_pickup_drop_off_window "Required*" Time end_pickup_drop_off_window "Required*" Enum pickup_type "Optional*" Enum drop_off_type "Optional*" Enum continuous_pickup "Optional*" Enum continuous_drop_off "Optional*" Float shape_dist_traveled "Optional" Enum timepoint "Optional" } "calendar (required*)" { Primary_key service_id "Required" Enum monday "Required" Enum tuesday "Required" Enum wednesday "Required" Enum thursday "Required" Enum friday "Required" Enum saturday "Required" Enum sunday "Required" Date start_date "Required" Date end_date "Required" } "calendar_dates (required*)" { Primary_key service_id "Required" Date date "Required" Enum exception_type "Required" } "frequencies (optional)" { Foreign_key trip_id "Required" Time start_time "Required" Time end_time "Required" Integer headway_secs "Required" Enum exact_times "Optional" } "shapes (optional)" { Primary_key shape_id "Required" Latitude shape_pt_lat "Required" Longitude shape_pt_lon "Required" Integer shape_pt_sequence "Required" Float shape_dist_traveled "Optional" } "transfers (optional)" { Foreign_key from_stop_id "Required*" Foreign_key to_stop_id "Required*" Foreign_key from_route_id "Optional" Foreign_key to_route_id "Optional" Foreign_key from_trip_id "Required*" Foreign_key to_trip_id "Required*" Enum transfer_type "Required" Integer min_transfer_time "Required" } "feed_info (required*)" { Text feed_publisher_name "Required" URL feed_publisher_url "Required" Language_code feed_lang "Required" Language_code default_lang "Optional" Date feed_start_date "Optional" Date feed_end_date "Optional" Text feed_version "Optional" Email feed_contact_email "Optional" URL feed_contact_url "Optional" } "attributions (optional)" { Primary_key attribution_id "Optional" Foreign_key agency_id "Optional" Foreign_key route_id "Optional" Foreign_key trip_id "Optional" Text organization_name "Required" Enum is_producer "Optional" Enum is_operator "Optional" Enum is_authority "Optional" URL attribution_url "Optional" Email attribution_email "Optional" Phone_number attribution_phone "Optional" } "translations (optional)" { Enum table_name "Required" Text field_name "Required" Language_code language "Required" Various translation "Required" Various record_id "Required*" Various record_sub_id "Required*" Various field_value "Required*" } "stops (required)" |o..|| "stops (required)" : "A stop may have a parent station" "stops (required)" |o..|| "levels (optional)" : "A stop may be on a level" "pathways (optional)" }|..|| "stops (required)" : "A pathway may connect a source and destination stop" "routes (required)" }|--|| "agency (required)" : "Each route belongs to an agency" "trips (required)" ||--|| "routes (required)" : "Each trip runs on a route" "trips (required)" ||--|| "calendar (required*)" : "Each trip runs on a calendar (if not calendar_date)" "trips (required)" ||--|| "calendar_dates (required*)" : "Each trip runs on a calendar_date (if not calendar)" "trips (required)" ||..|| "shapes (optional)" : "A trip may be drawn with a shape" "stop_times (required)" ||--|| "trips (required)" : "Each stop_time belongs to a trip" "stop_times (required)" ||--|| "stops (required)" : "Each stop_time belongs to a stop" "frequencies (optional)" ||--|| "trips (required)" : has "transfers (optional)" ||--|| "stops (required)" : has "transfers (optional)" ||--|| "routes (required)" : has "transfers (optional)" ||--|| "trips (required)" : has "attributions (optional)" ||--|| "agency (required)" : has "attributions (optional)" ||--|| "routes (required)" : has "attributions (optional)" ||--|| "trips (required)" : has
That's a big database! Thankfully, one only needs to understand the role of handful of core GTFS files to get started:
-
agency.txt
: an agency is a public transportation provider, i.e. an organisation that runs a network of buses, trams, trains, subways, etc. -
stops.txt
: a stop is a geographic location where vehicles pick up and/or drop off travellers. In real life, these are stations, shelters, sign posts, etc. -
routes.txt
: a route is a scheduled sequence of stops followed by one or more vehicles, like a bus line. -
calendar(_dates).txt
: calendars are lists of dates and days. They don't make much sense alone but are used for trips below. -
trips.txt
: a trip is the combination of a route and a calendar, i.e. a schedule for when that route is in service. -
stop_times.txt
: a stop time is a specific time when a vehicule arrives and/or departs from a specific stop for a specific trip, i.e. a departure.
The other GTFS files provide additional information about the network such as fares/pricing, inter-route connections, metadata about the feed itself, and much more.
What is GTFS Realtime?
It's a series of feeds that enrich a GTFS dataset. They're usually available through an API that responds with Protocolbuffer Binary Format (PBF)-compressed JSON data. Let's unpack this.
- a series of feeds...
There are three of them:
- Trip Updates: These are realtime updates to GTFS scheduled trips, including delays, cancellations, skipped stops, etc.
- Service Alerts: These are messages announcing, describing, or explaining changes to the network, such as road works, accidents, broken-down vehicles, manifestations, etc.
- Vehicle Positions: This feed gives the realtime location of vehicles, plus their description (e.g. vehicle type) and status information (e.g. occupancy rate).
There is no inter-dependency between the feeds, they can be used alone, but of course they provide the best value when used together. For example, a Service Alert announces a traffic jam on a route causing a delay, the Trip Update factors that delay into a realtime estimated time of arrival, and the Vehicle Position shows the vehicle's speed and progress along the route.
- ...that enrich a GTFS dataset...
It's important to note that a GTFS Realtime feed makes no sense with the underlying GTFS Schedule data. The realtime feeds communicate changes by referencing ids in the static dataset.
- ...usually available through an API...
Follow the instructions on the Swiss opendata platform to see for yourself. Just create an account (it's free and instant), generate a token for their API, and make a request to their gtfs-rt
endpoint (use Insomnia or Postman to pass the required Authorization header).
- ...that responds with Protocolbuffer Binary Format (PBF)...
There's a lot of information online about what PBF is, but the only thing that matters here is that it enables smaller, hence faster, responses. The Swiss API response is only 2.7 MB and takes less than 1 second to load, while the uncompressed JSON data weighs over 20 MB and takes more than 3 times the time to load.
The downside is that it's humanly unreadable and has to be decoded.
- ...JSON data.
After decoding, a GTFS Realtime looks like this:
{
"Header": {
"GtfsRealtimeVersion": "1.0",
"Incrementality": "FullDataset",
"Timestamp": 1714382870
},
"Entity": [
{
"Id": "801.TA.91-1-D-j24-1.1310.R",
"IsDeleted": false,
"TripUpdate": {
"Trip": {
"TripId": "801.TA.91-1-D-j24-1.1310.R",
"RouteId": "91-1-D-j24-1",
"StartTime": "07:07:00",
"StartDate": "20240429",
"ScheduleRelationship": "Scheduled"
},
"StopTimeUpdate": [
{
"StopSequence": 1,
"StopId": "8506302:0:1",
"Departure": {
"Delay": 60
},
"ScheduleRelationship": "Scheduled"
},
{
"StopSequence": 2,
"StopId": "8506210:0:4",
"Arrival": {
"Delay": 0
},
"Departure": {
"Delay": 60
},
"ScheduleRelationship": "Scheduled"
}
]
}
}
]
}
What is GTFS Flex?
🚧
Getting started with GTFS
To develop software using GTFS data, you need three things:
-
One or more source feeds. You can find plenty of them online, with a quick Google search, that are publicly accessible or with a free registration (like here, here, or here).
-
Import the feed(s) into a database. Here's a tutorial to ingest GTFS Schedule data using NodeJS and PostgreSQL.
-
Build an app using that data. Exploring our GTFS-based applications and related articles for inspiration and help.