Title: | Read, Validate, Analyze, and Map GTFS Feeds |
---|---|
Description: | Read General Transit Feed Specification (GTFS) zipfiles into a list of R dataframes. Perform validation of the data structure against the specification. Analyze the headways and frequencies at routes and stops. Create maps and perform spatial analysis on the routes and stops. Please see the GTFS documentation here for more detail: <https://gtfs.org/>. |
Authors: | Flavio Poletti [aut, cre], Daniel Herszenhut [aut] , Mark Padgham [aut], Tom Buckley [aut], Danton Noriega-Goodwin [aut], Angela Li [ctb], Elaine McVey [ctb], Charles Hans Thompson [ctb], Michael Sumner [ctb], Patrick Hausmann [ctb], Bob Rudis [ctb], James Lamb [ctb], Alexandra Kapp [ctb], Kearey Smith [ctb], Dave Vautin [ctb], Kyle Walker [ctb], Davis Vaughan [ctb], Ryan Rymarczyk [ctb], Kirill Müller [ctb] |
Maintainer: | Flavio Poletti <[email protected]> |
License: | GPL |
Version: | 1.7.0 |
Built: | 2025-01-20 16:43:01 UTC |
Source: | https://github.com/r-transit/tidytransit |
Convert another gtfs like object to a tidygtfs object
as_tidygtfs(x, ...)
as_tidygtfs(x, ...)
x |
gtfs object |
... |
ignored |
a tidygtfs object
Finds clusters of stops for each unique value in group_col
(e.g. stop_name). Can
be used to find different groups of stops that share the same name but are located more
than max_dist
apart. gtfs_stops
is assigned a new column (named cluster_colname
)
which contains the group_col
value and the cluster number.
cluster_stops( gtfs_stops, max_dist = 300, group_col = "stop_name", cluster_colname = "stop_name_cluster" )
cluster_stops( gtfs_stops, max_dist = 300, group_col = "stop_name", cluster_colname = "stop_name_cluster" )
gtfs_stops |
Stops table of a gtfs object. It is also possible to pass a tidygtfs object to enable piping. |
max_dist |
Only stop groups that have a maximum distance among them above this threshold (in meters) are clustered. |
group_col |
Clusters for are calculated for each set of stops with the same value in this column (default: stop_name) |
cluster_colname |
Name of the new column name. Can be the same as group_col to overwrite. |
stats::kmeans()
is used for clustering.
Returns a stops table with an added cluster column. If gtfs_stops
is a tidygtfs object, a
modified tidygtfs object is return
library(dplyr) nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) nyc <- cluster_stops(nyc) # There are 6 stops with the name "86 St" that are far apart stops_86_St = nyc$stops %>% filter(stop_name == "86 St") table(stops_86_St$stop_name_cluster) stops_86_St %>% select(stop_id, stop_name, parent_station, stop_name_cluster) %>% head() library(ggplot2) ggplot(stops_86_St) + geom_point(aes(stop_lon, stop_lat, color = stop_name_cluster))
library(dplyr) nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) nyc <- cluster_stops(nyc) # There are 6 stops with the name "86 St" that are far apart stops_86_St = nyc$stops %>% filter(stop_name == "86 St") table(stops_86_St$stop_name_cluster) stops_86_St %>% select(stop_id, stop_name, parent_station, stop_name_cluster) %>% head() library(ggplot2) ggplot(stops_86_St) + geom_point(aes(stop_lon, stop_lat, color = stop_name_cluster))
Convert empty strings ("") to NA values in all gtfs tables
empty_strings_to_na(gtfs_obj)
empty_strings_to_na(gtfs_obj)
gtfs_obj |
gtfs feed (tidygtfs object) |
a gtfs_obj where all empty strings in tables have been replaced with NA
Only stop_times, stops, routes, services (in calendar and calendar_dates), shapes, frequencies and transfers belonging to one of those trips are kept.
filter_feed_by_area(gtfs_obj, area)
filter_feed_by_area(gtfs_obj, area)
gtfs_obj |
gtfs feed (tidygtfs object) |
area |
all trips passing through this area are kept. Either a bounding box (numeric vector with xmin, ymin, xmax, ymax) or a sf object. |
tidygtfs object with filtered tables
filter_feed_by_stops
, filter_feed_by_trips
, filter_feed_by_date
Only stop_times, stops, routes, services (in calendar and calendar_dates), shapes, frequencies and transfers belonging to one of those trips are kept.
filter_feed_by_date( gtfs_obj, extract_date, min_departure_time, max_arrival_time )
filter_feed_by_date( gtfs_obj, extract_date, min_departure_time, max_arrival_time )
gtfs_obj |
gtfs feed (tidygtfs object) |
extract_date |
date to extract trips from this day (Date or "YYYY-MM-DD" string) |
min_departure_time |
(optional) The earliest departure time. Can be given as "HH:MM:SS", hms object or numeric value in seconds. |
max_arrival_time |
(optional) The latest arrival time. Can be given as "HH:MM:SS", hms object or numeric value in seconds. |
tidygtfs object with filtered tables
filter_stop_times
, filter_feed_by_trips
,
filter_feed_by_trips
, filter_feed_by_date
Only stop_times, stops, routes, services (in calendar and calendar_dates), shapes, frequencies and transfers belonging to one of those trips are kept.
filter_feed_by_stops(gtfs_obj, stop_ids = NULL, stop_names = NULL)
filter_feed_by_stops(gtfs_obj, stop_ids = NULL, stop_names = NULL)
gtfs_obj |
gtfs feed (tidygtfs object) |
stop_ids |
vector with stop_ids. You can either provide stop_ids or stop_names |
stop_names |
vector with stop_names (will be converted to stop_ids) |
tidygtfs object with filtered tables
The returned gtfs_obj likely contains more than just the stops given (i.e. all stops that belong to a trip passing the initial stop).
filter_feed_by_trips
, filter_feed_by_trips
, filter_feed_by_date
Only stop_times, stops, routes, services (in calendar and calendar_dates), shapes, frequencies and transfers belonging to one of those trips are kept.
filter_feed_by_trips(gtfs_obj, trip_ids)
filter_feed_by_trips(gtfs_obj, trip_ids)
gtfs_obj |
gtfs feed (tidygtfs object) |
trip_ids |
vector with trip_ids |
tidygtfs object with filtered tables
filter_feed_by_stops
, filter_feed_by_area
, filter_feed_by_date
stop_times
table for a given date and timespan.Filter a stop_times
table for a given date and timespan.
filter_stop_times(gtfs_obj, extract_date, min_departure_time, max_arrival_time)
filter_stop_times(gtfs_obj, extract_date, min_departure_time, max_arrival_time)
gtfs_obj |
gtfs feed (tidygtfs object) |
extract_date |
date to extract trips from this day (Date or "YYYY-MM-DD" string) |
min_departure_time |
(optional) The earliest departure time. Can be given as "HH:MM:SS", hms object or numeric value in seconds. |
max_arrival_time |
(optional) The latest arrival time. Can be given as "HH:MM:SS", hms object or numeric value in seconds. |
Filtered stop_times
data.table for travel_times()
and raptor()
.
feed_path <- system.file("extdata", "routing.zip", package = "tidytransit") g <- read_gtfs(feed_path) # filter the sample feed stop_times <- filter_stop_times(g, "2018-10-01", "06:00:00", "08:00:00")
feed_path <- system.file("extdata", "routing.zip", package = "tidytransit") g <- read_gtfs(feed_path) # filter the sample feed stop_times <- filter_stop_times(g, "2018-10-01", "06:00:00", "08:00:00")
Get a set of stops for a given set of service ids and route ids
filter_stops(gtfs_obj, service_ids, route_ids)
filter_stops(gtfs_obj, service_ids, route_ids)
gtfs_obj |
gtfs feed (tidygtfs object) |
service_ids |
the service for which to get stops |
route_ids |
the route_ids for which to get stops |
stops table for a given service or route
library(dplyr) local_gtfs_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(local_gtfs_path) select_service_id <- filter(nyc$calendar, monday==1) %>% pull(service_id) select_route_id <- sample_n(nyc$routes, 1) %>% pull(route_id) filtered_stops_df <- filter_stops(nyc, select_service_id, select_route_id)
library(dplyr) local_gtfs_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(local_gtfs_path) select_service_id <- filter(nyc$calendar, monday==1) %>% pull(service_id) select_route_id <- sample_n(nyc$routes, 1) %>% pull(route_id) filtered_stops_df <- filter_stops(nyc, select_service_id, select_route_id)
Calculate the number of departures and mean headways for routes within a given timespan and for given service_ids.
get_route_frequency( gtfs_obj, start_time = "06:00:00", end_time = "22:00:00", service_ids = NULL )
get_route_frequency( gtfs_obj, start_time = "06:00:00", end_time = "22:00:00", service_ids = NULL )
gtfs_obj |
gtfs feed (tidygtfs object) |
start_time |
analysis start time, can be given as "HH:MM:SS", hms object or numeric value in seconds. |
end_time |
analysis period end time, can be given as "HH:MM:SS", hms object or numeric value in seconds. |
service_ids |
A set of service_ids from the calendar dataframe identifying a particular service id. If not provided, the service_id with the most departures is used. |
a dataframe of routes with variables or headway/frequency in seconds for a route within a given time frame
Some GTFS feeds contain a frequency data frame already. Consider using this instead, as it will be more accurate than what tidytransit calculates.
data(gtfs_duke) routes_frequency <- get_route_frequency(gtfs_duke) x <- order(routes_frequency$median_headways) head(routes_frequency[x,])
data(gtfs_duke) routes_frequency <- get_route_frequency(gtfs_duke) x <- order(routes_frequency$median_headways) head(routes_frequency[x,])
Get all trip shapes for a given route and service
get_route_geometry(gtfs_sf_obj, route_ids = NULL, service_ids = NULL)
get_route_geometry(gtfs_sf_obj, route_ids = NULL, service_ids = NULL)
gtfs_sf_obj |
tidytransit gtfs object with sf data frames |
route_ids |
routes to extract |
service_ids |
service_ids to extract |
an sf dataframe for gtfs routes with a row/linestring for each trip
data(gtfs_duke) gtfs_duke_sf <- gtfs_as_sf(gtfs_duke) routes_sf <- get_route_geometry(gtfs_duke_sf) plot(routes_sf[c(1,1350),])
data(gtfs_duke) gtfs_duke_sf <- gtfs_as_sf(gtfs_duke) routes_sf <- get_route_geometry(gtfs_duke_sf) plot(routes_sf[c(1,1350),])
Calculate the number of departures and mean headways for all stops within a given timespan and for given service_ids.
get_stop_frequency( gtfs_obj, start_time = "06:00:00", end_time = "22:00:00", service_ids = NULL, by_route = TRUE )
get_stop_frequency( gtfs_obj, start_time = "06:00:00", end_time = "22:00:00", service_ids = NULL, by_route = TRUE )
gtfs_obj |
gtfs feed (tidygtfs object) |
start_time |
analysis start time, can be given as "HH:MM:SS", hms object or numeric value in seconds. |
end_time |
analysis period end time, can be given as "HH:MM:SS", hms object or numeric value in seconds. |
service_ids |
A set of service_ids from the calendar dataframe identifying a particular service id. If not provided, the service_id with the most departures is used. |
by_route |
Default TRUE, if FALSE then calculate headway for any line coming through the stop in the same direction on the same schedule. |
dataframe of stops with the number of departures and the headway (departures divided by timespan) in seconds as columns
Some GTFS feeds contain a frequency data frame already. Consider using this instead, as it will be more accurate than what tidytransit calculates.
data(gtfs_duke) stop_frequency <- get_stop_frequency(gtfs_duke) x <- order(stop_frequency$mean_headway) head(stop_frequency[x,])
data(gtfs_duke) stop_frequency <- get_stop_frequency(gtfs_duke) x <- order(stop_frequency$mean_headway) head(stop_frequency[x,])
Get all trip shapes for given trip ids
get_trip_geometry(gtfs_sf_obj, trip_ids)
get_trip_geometry(gtfs_sf_obj, trip_ids)
gtfs_sf_obj |
tidytransit gtfs object with sf data frames |
trip_ids |
trip_ids to extract shapes |
an sf dataframe for gtfs routes with a row/linestring for each trip
data(gtfs_duke) gtfs_duke <- gtfs_as_sf(gtfs_duke) trips_sf <- get_trip_geometry(gtfs_duke, c("t_726295_b_19493_tn_41", "t_726295_b_19493_tn_40")) plot(trips_sf[1,"shape_id"])
data(gtfs_duke) gtfs_duke <- gtfs_as_sf(gtfs_duke) trips_sf <- get_trip_geometry(gtfs_duke, c("t_726295_b_19493_tn_41", "t_726295_b_19493_tn_40")) plot(trips_sf[1,"shape_id"])
Stops are converted to POINT sf data frames. Shapes are converted to a
LINESTRING data frame. Note that this function replaces stops and shapes
tables in gtfs_obj
.
gtfs_as_sf(gtfs_obj, skip_shapes = FALSE, crs = NULL, quiet = TRUE)
gtfs_as_sf(gtfs_obj, skip_shapes = FALSE, crs = NULL, quiet = TRUE)
gtfs_obj |
gtfs feed (tidygtfs object, created by |
skip_shapes |
if TRUE, shapes are not converted. Default FALSE. |
crs |
optional coordinate reference system (used by sf::st_transform) to transform lon/lat coordinates of stops and shapes |
quiet |
boolean whether to print status messages |
tidygtfs object with stops and shapes as sf dataframes
sf_as_tbl
, stops_as_sf
, shapes_as_sf
Data obtained from https://data.trilliumtransit.com/gtfs/duke-nc-us/duke-nc-us.zip.
gtfs_duke
gtfs_duke
An object of class tidygtfs
(inherits from gtfs
) of length 25.
Transform coordinates of a gtfs feed
gtfs_transform(gtfs_obj, crs)
gtfs_transform(gtfs_obj, crs)
gtfs_obj |
gtfs feed (tidygtfs object) |
crs |
target coordinate reference system, used by sf::st_transform |
tidygtfs object with transformed stops and shapes sf dataframes
gtfs object with transformed sf tables
Interpolate missing stop_times linearly
interpolate_stop_times(x, use_shape_dist = TRUE)
interpolate_stop_times(x, use_shape_dist = TRUE)
x |
tidygtfs object or stop_times table |
use_shape_dist |
If TRUE, use |
tidygtfs or stop_times with interpolated arrival and departure times
## Not run: data(gtfs_duke) print(gtfs_duke$stop_times[1:5, 1:5]) gtfs_duke_2 = interpolate_stop_times(gtfs_duke) print(gtfs_duke_2$stop_times[1:5, 1:5]) gtfs_duke_3 = interpolate_stop_times(gtfs_duke, FALSE) print(gtfs_duke_3$stop_times[1:5, 1:5]) ## End(Not run)
## Not run: data(gtfs_duke) print(gtfs_duke$stop_times[1:5, 1:5]) gtfs_duke_2 = interpolate_stop_times(gtfs_duke) print(gtfs_duke_2$stop_times[1:5, 1:5]) gtfs_duke_3 = interpolate_stop_times(gtfs_duke, FALSE) print(gtfs_duke_3$stop_times[1:5, 1:5]) ## End(Not run)
Convert NA values to empty strings ("")
na_to_empty_strings(gtfs_obj)
na_to_empty_strings(gtfs_obj)
gtfs_obj |
gtfs feed (tidygtfs object) |
a gtfs_obj where all NA strings in tables have been replaced with ""
Plot GTFS stops and trips
## S3 method for class 'tidygtfs' plot(x, ...)
## S3 method for class 'tidygtfs' plot(x, ...)
x |
a tidygtfs object as read by |
... |
ignored for tidygtfs |
plot
local_gtfs_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(local_gtfs_path) plot(nyc)
local_gtfs_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(local_gtfs_path) plot(nyc)
Prints a GTFS object suppressing the class
attribute and hiding the
validation_result attribute, created with validate_gtfs()
.
## S3 method for class 'tidygtfs' print(x, ...)
## S3 method for class 'tidygtfs' print(x, ...)
x |
a tidygtfs object as read by |
... |
Optional arguments ultimately passed to |
The GTFS object that was printed, invisibly
## Not run: path = system.file("extdata", "nyc_subway.zip", package = "tidytransit") g = read_gtfs(path) print(g) ## End(Not run)
## Not run: path = system.file("extdata", "nyc_subway.zip", package = "tidytransit") g = read_gtfs(path) print(g) ## End(Not run)
raptor
finds the minimal travel time, earliest or latest arrival time for all
stops in stop_times
with journeys departing from stop_ids
within
time_range
.
raptor( stop_times, transfers, stop_ids, arrival = FALSE, time_range = 3600, max_transfers = NULL, keep = "all" )
raptor( stop_times, transfers, stop_ids, arrival = FALSE, time_range = 3600, max_transfers = NULL, keep = "all" )
stop_times |
A (prepared) stop_times table from a gtfs feed. Prepared means
that all stop time rows before the desired journey departure time
should be removed. The table should also only include departures
happening on one day. Use |
transfers |
Transfers table from a gtfs feed. In general no preparation
is needed. Can be omitted if stop_times has been prepared with
|
stop_ids |
Character vector with stop_ids from where journeys should start (or end). It is recommended to only use stop_ids that are related to each other, like different platforms in a train station or bus stops that are reasonably close to each other. |
arrival |
If FALSE (default), all journeys start from |
time_range |
Either a range in seconds or a vector containing the minimal and maximal
departure time (i.e. earliest and latest possible journey departure time)
as seconds or "HH:MM:SS" character. If |
max_transfers |
Maximum number of transfers allowed, no limit (NULL) as default. |
keep |
One of c("all", "shortest", "earliest", "latest"). By default, |
With a modified Round-Based Public Transit Routing Algorithm
(RAPTOR) using data.table, earliest arrival times for all stops are calculated. If two
journeys arrive at the same time, the one with the later departure time and thus shorter
travel time is kept. By default, all journeys departing within time_range
that arrive
at a stop are returned in a table. If you want all journeys arriving at stop_ids within
the specified time range, set arrival
to TRUE.
Journeys are defined by a "from" and "to" stop_id, a departure, arrival and travel time. Note that exact journeys (with each intermediate stop and route ids for example) are not returned.
For most cases, stop_times
needs to be filtered, as it should only contain trips
happening on a single day, see filter_stop_times()
. The algorithm scans all trips
until it exceeds max_transfers
or all trips in stop_times
have been visited.
A data.table with journeys (departure, arrival and travel time) to/from all
stop_ids reachable by stop_ids
.
travel_times()
for an easier access to travel time calculations via stop_names.
nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) # you can use initial walk times to different stops in walking distance (arbitrary example values) stop_ids_harlem_st <- c("301", "301N", "301S") stop_ids_155_st <- c("A11", "A11N", "A11S", "D12", "D12N", "D12S") walk_times <- data.frame(stop_id = c(stop_ids_harlem_st, stop_ids_155_st), walk_time = c(rep(600, 3), rep(410, 6)), stringsAsFactors = FALSE) # Use journeys departing after 7 AM with arrival time before 11 AM on 26th of June stop_times <- filter_stop_times(nyc, "2018-06-26", 7*3600, 9*3600) # calculate all journeys departing from Harlem St or 155 St between 7:00 and 7:30 rptr <- raptor(stop_times, nyc$transfers, walk_times$stop_id, time_range = 1800, keep = "all") # add walk times to travel times rptr <- merge(rptr, walk_times, by.x = "from_stop_id", by.y = "stop_id") rptr$travel_time_incl_walk <- rptr$travel_time + rptr$walk_time # get minimal travel times (with walk times) for all stop_ids library(data.table) shortest_travel_times <- setDT(rptr)[order(travel_time_incl_walk)][, .SD[1], by = "to_stop_id"] hist(shortest_travel_times$travel_time, breaks = seq(0,2*60)*60)
nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) # you can use initial walk times to different stops in walking distance (arbitrary example values) stop_ids_harlem_st <- c("301", "301N", "301S") stop_ids_155_st <- c("A11", "A11N", "A11S", "D12", "D12N", "D12S") walk_times <- data.frame(stop_id = c(stop_ids_harlem_st, stop_ids_155_st), walk_time = c(rep(600, 3), rep(410, 6)), stringsAsFactors = FALSE) # Use journeys departing after 7 AM with arrival time before 11 AM on 26th of June stop_times <- filter_stop_times(nyc, "2018-06-26", 7*3600, 9*3600) # calculate all journeys departing from Harlem St or 155 St between 7:00 and 7:30 rptr <- raptor(stop_times, nyc$transfers, walk_times$stop_id, time_range = 1800, keep = "all") # add walk times to travel times rptr <- merge(rptr, walk_times, by.x = "from_stop_id", by.y = "stop_id") rptr$travel_time_incl_walk <- rptr$travel_time + rptr$walk_time # get minimal travel times (with walk times) for all stop_ids library(data.table) shortest_travel_times <- setDT(rptr)[order(travel_time_incl_walk)][, .SD[1], by = "to_stop_id"] hist(shortest_travel_times$travel_time, breaks = seq(0,2*60)*60)
Reads a GTFS feed from either a local .zip
file or an URL and validates them against
GTFS specifications.
read_gtfs(path, files = NULL, quiet = TRUE, ...)
read_gtfs(path, files = NULL, quiet = TRUE, ...)
path |
The path to a GTFS |
files |
A character vector containing the text files to be validated against the GTFS
specification without the file extension ( |
quiet |
Whether to hide log messages and progress bars (defaults to TRUE). |
... |
Can be used to pass on arguments to |
A tidygtfs object: a list of tibbles in which each entry represents a GTFS text
file. Additional tables are stored in the .
sublist.
## Not run: local_gtfs_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") gtfs <- read_gtfs(local_gtfs_path) summary(gtfs) gtfs <- read_gtfs(local_gtfs_path, files = c("trips", "stop_times")) names(gtfs) ## End(Not run)
## Not run: local_gtfs_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") gtfs <- read_gtfs(local_gtfs_path) summary(gtfs) gtfs <- read_gtfs(local_gtfs_path, files = c("trips", "stop_times")) names(gtfs) ## End(Not run)
Extended GTFS Route Types: https://developers.google.com/transit/gtfs/reference/extended-route-types
route_type_names
route_type_names
A data frame with 136 rows and 2 variables:
the id of route type
name of the gtfs route type
https://gist.github.com/derhuerst/b0243339e22c310bee2386388151e11e
Each trip has a defined number of dates it runs on. This set of dates is called a
service pattern in tidytransit. Trips with the same servicepattern
id run on the same
dates. In general, service_id
can work this way but it is not enforced by the
GTFS standard.
set_servicepattern( gtfs_obj, id_prefix = "s_", hash_algo = "md5", hash_length = 7 )
set_servicepattern( gtfs_obj, id_prefix = "s_", hash_algo = "md5", hash_length = 7 )
gtfs_obj |
gtfs feed (tidygtfs object) |
id_prefix |
all servicepattern ids will start with this string |
hash_algo |
hashing algorithm used by digest |
hash_length |
length the hash should be cut to with |
modified gtfs_obj with added servicepattern list and a table linking
trips and pattern (trip_servicepatterns), added to gtfs_obj$.
sublist.
Coordinates are transformed to lon/lat columns (stop_lon
/stop_lat
or
shape_pt_lon
/shape_pt_lat
)
sf_as_tbl(gtfs_obj)
sf_as_tbl(gtfs_obj)
gtfs_obj |
gtfs feed (tidygtfs object) |
tidygtfs object with stops and shapes converted to tibbles
Convert shapes into Simple Features Linestrings
shapes_as_sf(gtfs_shapes, crs = NULL)
shapes_as_sf(gtfs_shapes, crs = NULL)
gtfs_shapes |
a gtfs$shapes dataframe |
crs |
optional coordinate reference system (used by sf::st_transform) to transform lon/lat coordinates |
an sf dataframe for gtfs shapes
Calculate distances between a given set of stops
stop_distances(gtfs_stops)
stop_distances(gtfs_stops)
gtfs_stops |
gtfs stops table either as data frame (with at least |
Returns a data.frame with each row containing a pair of stop_ids (columns
from_stop_id
and to_stop_id
) and the distance
between them (in meters)
The resulting data.frame has nrow(gtfs_stops)^2
rows, distances calculations
among all stops for large feeds should be avoided.
## Not run: library(dplyr) nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) nyc$stops %>% filter(stop_name == "Borough Hall") %>% stop_distances() %>% arrange(desc(distance)) #> # A tibble: 36 × 3 #> from_stop_id to_stop_id distance #> <chr> <chr> <dbl> #> 1 423 232 91.5 #> 2 423N 232 91.5 #> 3 423S 232 91.5 #> 4 423 232N 91.5 #> 5 423N 232N 91.5 #> 6 423S 232N 91.5 #> 7 423 232S 91.5 #> 8 423N 232S 91.5 #> 9 423S 232S 91.5 #> 10 232 423 91.5 #> # … with 26 more rows ## End(Not run)
## Not run: library(dplyr) nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) nyc$stops %>% filter(stop_name == "Borough Hall") %>% stop_distances() %>% arrange(desc(distance)) #> # A tibble: 36 × 3 #> from_stop_id to_stop_id distance #> <chr> <chr> <dbl> #> 1 423 232 91.5 #> 2 423N 232 91.5 #> 3 423S 232 91.5 #> 4 423 232N 91.5 #> 5 423N 232N 91.5 #> 6 423S 232N 91.5 #> 7 423 232S 91.5 #> 8 423N 232S 91.5 #> 9 423S 232S 91.5 #> 10 232 423 91.5 #> # … with 26 more rows ## End(Not run)
By default calculates distances among stop_ids with the same stop_name.
stop_group_distances(gtfs_stops, by = "stop_name")
stop_group_distances(gtfs_stops, by = "stop_name")
gtfs_stops |
gtfs stops table either as data frame (with at least |
by |
group column, default: "stop_name" |
data.frame with one row per group containing a distance matrix (distances), number of stop ids within that group (n_stop_ids) and distance summary values (dist_mean, dist_median and dist_max).
## Not run: library(dplyr) nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) stop_group_distances(nyc$stops) #> # A tibble: 380 × 6 #> stop_name distances n_stop_ids dist_mean dist_median dist_max #> <chr> <list> <dbl> <dbl> <dbl> <dbl> #> 1 86 St <dbl [18 × 18]> 18 5395. 5395. 21811. #> 2 79 St <dbl [6 × 6]> 6 19053. 19053. 19053. #> 3 Prospect Av <dbl [6 × 6]> 6 18804. 18804. 18804. #> 4 77 St <dbl [6 × 6]> 6 16947. 16947. 16947. #> 5 59 St <dbl [6 × 6]> 6 14130. 14130. 14130. #> 6 50 St <dbl [9 × 9]> 9 7097. 7097. 14068. #> 7 36 St <dbl [6 × 6]> 6 12496. 12496. 12496. #> 8 8 Av <dbl [6 × 6]> 6 11682. 11682. 11682. #> 9 7 Av <dbl [9 × 9]> 9 5479. 5479. 10753. #> 10 111 St <dbl [9 × 9]> 9 3877. 3877. 7753. #> # … with 370 more rows ## End(Not run)
## Not run: library(dplyr) nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) stop_group_distances(nyc$stops) #> # A tibble: 380 × 6 #> stop_name distances n_stop_ids dist_mean dist_median dist_max #> <chr> <list> <dbl> <dbl> <dbl> <dbl> #> 1 86 St <dbl [18 × 18]> 18 5395. 5395. 21811. #> 2 79 St <dbl [6 × 6]> 6 19053. 19053. 19053. #> 3 Prospect Av <dbl [6 × 6]> 6 18804. 18804. 18804. #> 4 77 St <dbl [6 × 6]> 6 16947. 16947. 16947. #> 5 59 St <dbl [6 × 6]> 6 14130. 14130. 14130. #> 6 50 St <dbl [9 × 9]> 9 7097. 7097. 14068. #> 7 36 St <dbl [6 × 6]> 6 12496. 12496. 12496. #> 8 8 Av <dbl [6 × 6]> 6 11682. 11682. 11682. #> 9 7 Av <dbl [9 × 9]> 9 5479. 5479. 10753. #> 10 111 St <dbl [9 × 9]> 9 3877. 3877. 7753. #> # … with 370 more rows ## End(Not run)
Convert stops into Simple Features Points
stops_as_sf(stops, crs = NULL)
stops_as_sf(stops, crs = NULL)
stops |
a gtfs$stops dataframe |
crs |
optional coordinate reference system (used by sf::st_transform) to transform lon/lat coordinates |
an sf dataframe for gtfs routes with a point column
data(gtfs_duke) some_stops <- gtfs_duke$stops[sample(nrow(gtfs_duke$stops), 40),] some_stops_sf <- stops_as_sf(some_stops) plot(some_stops_sf[,"stop_name"])
data(gtfs_duke) some_stops <- gtfs_duke$stops[sample(nrow(gtfs_duke$stops), 40),] some_stops_sf <- stops_as_sf(some_stops) plot(some_stops_sf[,"stop_name"])
GTFS feed summary
## S3 method for class 'tidygtfs' summary(object, ...)
## S3 method for class 'tidygtfs' summary(object, ...)
object |
a tidygtfs object as read by |
... |
ignored for tidygtfs |
the tidygtfs object, invisibly
Function to calculate the shortest travel times from a stop (given by stop_name
)
to all other stop_names of a feed. filtered_stop_times
needs to be created before with
filter_stop_times()
or filter_feed_by_date()
.
travel_times( filtered_stop_times, stop_name, time_range = 3600, arrival = FALSE, max_transfers = NULL, max_departure_time = NULL, return_coords = FALSE, return_DT = FALSE, stop_dist_check = 300 )
travel_times( filtered_stop_times, stop_name, time_range = 3600, arrival = FALSE, max_transfers = NULL, max_departure_time = NULL, return_coords = FALSE, return_DT = FALSE, stop_dist_check = 300 )
filtered_stop_times |
stop_times data.table (with transfers and stops tables as
attributes) created with |
stop_name |
Stop name for which travel times should be calculated. A vector with multiple names can be used. |
time_range |
Either a range in seconds or a vector containing the minimal and maximal
departure time (i.e. earliest and latest possible journey departure time)
as seconds or "HH:MM:SS" character. If |
arrival |
If FALSE (default), all journeys start from |
max_transfers |
The maximum number of transfers. No limit if |
max_departure_time |
Deprecated. Use |
return_coords |
Returns stop coordinates (lon/lat) as columns. Default is FALSE. |
return_DT |
travel_times() returns a data.table if TRUE. Default is FALSE which
returns a |
stop_dist_check |
stop_names are not structured identifiers like
stop_ids or parent_stations, so it's possible that
stops with the same name are far apart. travel_times()
errors if the distance among stop_ids with the same name is
above this threshold (in meters).
Use FALSE to turn check off. However, it is recommended to
either use |
This function allows easier access to raptor()
by using stop names instead of ids and
returning shortest travel times by default.
Note however that stop_name might not be a suitable identifier for a feed. It is possible
that multiple stops have the same name while not being related or geographically close to
each other. stop_group_distances()
and cluster_stops()
can help identify and fix
issues with stop_names.
A table with travel times to/from all stops reachable by stop_name
and their
corresponding journey departure and arrival times.
library(dplyr) # 1) Calculate travel times from two closely related stops # The example dataset gtfs_duke has missing times (allowed in gtfs) which is # why we run interpolate_stop_times beforehand gtfs = interpolate_stop_times(gtfs_duke) tts1 = gtfs %>% filter_feed_by_date("2019-08-26") %>% travel_times(c("Campus Dr at Arts Annex (WB)", "Campus Dr at Arts Annex (EB)"), time_range = c("14:00:00", "15:30:00")) # you can use either filter_feed_by_date or filter_stop_times to prepare the feed # the result is the same tts2 = gtfs %>% filter_stop_times("2019-08-26", "14:00:00") %>% travel_times(c("Campus Dr at Arts Annex (WB)", "Campus Dr at Arts Annex (EB)"), time_range = 1.5*3600) # 1.5h after 14:00 all(tts1 == tts2) # It's recommended to store the filtered feed, since it can be time consuming to # run it for every travel time calculation, see the next example steps # 2) separate filtering and travel time calculation for a more granular analysis # stop_names in this feed are not restricted to an area, create clusters of stops to fix nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) nyc <- cluster_stops(nyc, group_col = "stop_name", cluster_colname = "stop_name") # Use journeys departing after 7 AM with arrival time before 9 AM on 26th June stop_times <- filter_stop_times(nyc, "2018-06-26", 7*3600, 9*3600) # Calculate travel times from "34 St - Herald Sq" tts <- travel_times(stop_times, "34 St - Herald Sq", return_coords = TRUE) # only keep journeys under one hour for plotting tts <- tts %>% filter(travel_time <= 3600) # travel time to Queensboro Plaza is 810 seconds, 13:30 minutes tts %>% filter(to_stop_name == "Queensboro Plaza") %>% mutate(travel_time = hms::hms(travel_time)) # plot a simple map showing travel times to all reachable stops # this can be expanded to isochron maps library(ggplot2) ggplot(tts) + geom_point(aes(x=to_stop_lon, y=to_stop_lat, color = travel_time))
library(dplyr) # 1) Calculate travel times from two closely related stops # The example dataset gtfs_duke has missing times (allowed in gtfs) which is # why we run interpolate_stop_times beforehand gtfs = interpolate_stop_times(gtfs_duke) tts1 = gtfs %>% filter_feed_by_date("2019-08-26") %>% travel_times(c("Campus Dr at Arts Annex (WB)", "Campus Dr at Arts Annex (EB)"), time_range = c("14:00:00", "15:30:00")) # you can use either filter_feed_by_date or filter_stop_times to prepare the feed # the result is the same tts2 = gtfs %>% filter_stop_times("2019-08-26", "14:00:00") %>% travel_times(c("Campus Dr at Arts Annex (WB)", "Campus Dr at Arts Annex (EB)"), time_range = 1.5*3600) # 1.5h after 14:00 all(tts1 == tts2) # It's recommended to store the filtered feed, since it can be time consuming to # run it for every travel time calculation, see the next example steps # 2) separate filtering and travel time calculation for a more granular analysis # stop_names in this feed are not restricted to an area, create clusters of stops to fix nyc_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") nyc <- read_gtfs(nyc_path) nyc <- cluster_stops(nyc, group_col = "stop_name", cluster_colname = "stop_name") # Use journeys departing after 7 AM with arrival time before 9 AM on 26th June stop_times <- filter_stop_times(nyc, "2018-06-26", 7*3600, 9*3600) # Calculate travel times from "34 St - Herald Sq" tts <- travel_times(stop_times, "34 St - Herald Sq", return_coords = TRUE) # only keep journeys under one hour for plotting tts <- tts %>% filter(travel_time <= 3600) # travel time to Queensboro Plaza is 810 seconds, 13:30 minutes tts %>% filter(to_stop_name == "Queensboro Plaza") %>% mutate(travel_time = hms::hms(travel_time)) # plot a simple map showing travel times to all reachable stops # this can be expanded to isochron maps library(ggplot2) ggplot(tts) + geom_point(aes(x=to_stop_lon, y=to_stop_lat, color = travel_time))
Validates the GTFS object against GTFS specifications and raises warnings if
required files/fields are not found. This function is called in read_gtfs()
.
validate_gtfs(gtfs_obj, files = NULL, warnings = TRUE)
validate_gtfs(gtfs_obj, files = NULL, warnings = TRUE)
gtfs_obj |
gtfs object (i.e. a list of tables, not necessary a tidygtfs object) |
files |
A character vector containing the text files to be validated
against the GTFS specification without the file extension ( |
warnings |
Whether to display warning messages (defaults to |
Note that this function just checks if required files or fields are missing. There's no validation for internal consistency (e.g. no departure times before arrival times or calendar covering a reasonable period).
A validation_result
tibble containing the validation summary of all
possible fields from the specified files.
GTFS object's files and fields are validated against the GTFS specifications as documented in GTFS Schedule Reference:
GTFS feeds are considered valid if they include all required files and fields. If a required file/field is missing the function (optionally) raises a warning.
Optional files/fields are listed in the reference above but are not required, thus no warning is raised if they are missing.
Extra files/fields are those who are not listed in the reference above (either because they refer to a specific GTFS extension or due to any other reason).
Note that some files (calendar.txt
, calendar_dates.txt
and
feed_info.txt
) are conditionally required. This means that:
calendar.txt
is initially set as a required file. If it's not
present, however, it becomes optional and calendar_dates.txt
(originally set as optional) becomes required.
feed_info.txt
is initially set as an optional file. If
translations.txt
is present, however, it becomes required.
validate_gtfs(gtfs_duke) ## Not run: local_gtfs_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") gtfs <- read_gtfs(local_gtfs_path) attr(gtfs, "validation_result") gtfs$shapes <- NULL validation_result <- validate_gtfs(gtfs) # should raise a warning gtfs$stop_times <- NULL validation_result <- validate_gtfs(gtfs) ## End(Not run)
validate_gtfs(gtfs_duke) ## Not run: local_gtfs_path <- system.file("extdata", "nyc_subway.zip", package = "tidytransit") gtfs <- read_gtfs(local_gtfs_path) attr(gtfs, "validation_result") gtfs$shapes <- NULL validation_result <- validate_gtfs(gtfs) # should raise a warning gtfs$stop_times <- NULL validation_result <- validate_gtfs(gtfs) ## End(Not run)
Write a tidygtfs object to a zip file
write_gtfs(gtfs_obj, zipfile, compression_level = 9, as_dir = FALSE)
write_gtfs(gtfs_obj, zipfile, compression_level = 9, as_dir = FALSE)
gtfs_obj |
gtfs feed (tidygtfs object) |
zipfile |
path to the zip file the feed should be written to. The file is overwritten if it already exists. |
compression_level |
a number between 1 and 9, defaults to 9 (best compression). |
as_dir |
if |
Invisibly returns gtfs_obj
Auxiliary tidytransit tables (e.g. dates_services
) are not exported. Calls
gtfsio::export_gtfs()
after preparing the data.