Geocoding police reports to find the spot where the most bike crashes occur
Objective
In this project, we will geocode the crash data to identify the spots where the accidents involving bikes in province. This will allow us to determine in which areas an intervention to reduce the risk to active transportation would be most useful.
Data sources
Open data about the 2011-2016 car crashes reported to the police
come from the province of Québec’s open data portal.
The data dictionary is also available on-line.
Packages used
Data wrangling is done using packages from the tidyverse
.
Geospatial data is geocoded using the ggmap
package and treated using sf
.
Data visualisations are done using ggplot2
and leaflet
.
If I remember correctly, the leaflet.extras
package is used to the leaflet heatmap.
Color palettes for the maps are generated using the viridis
package.
Data tables are displayed using the DT
package.
Code
The code that generated this document is located in https://github.com/SimonCoulombe/snippets/blob/master/content/post/2018-1-16-saaqmtq.Rmd
Define functions, downloads files
Clean data and prepare for geocoding
Here is a snapshot of the data as it was received
When preparing the car crash (“accidents”) data, we generate a factor variable of the seriousness (“gravité”) of the crash, from least serious to most serious.
French | English |
---|---|
Dommages matériels seulement | fender bender |
Léger | minor injuries |
Grave | major injuries |
Mortel | deadly |
To convert the municipality codes to names, I created a tab-separated file from this tablea on the provincial government website.
The dataset contains multiple variables related to the crash location, but it doesnt include the latitude and longitude of the crash. We will need to create a string variable (I called it “location”) that will be passed to the google maps API so that it can return us a latitude and a longitude.
The geographical variables are as follow. They are never all filled.
- “NO_CIVIQ_ACCDN” , the street civic number
- “SFX_NO_CIVIQ_ACCDN”, a suffix to the street number
- “RUE_ACCDN”, the road name
- “CD_MUNCP”, the city code. Here is a dictionnary to convert city code to name.
- “NO_ROUTE” is the road number where the accident happened (numbered roads are typically highways). This seems to be used as a alternative to the road name RUE_ACCDN.
- “CD_PNT_CDRNL_ROUTE” is the direction (North, South, East, West (Ouest) ) travelled on the road/highway.
- “BORNE_KM_ACCDN” is the milestone number (used on highways and northern gravel roads)
They also use landmarks (road crossings, etc..) to help locate the accident:
- TP_REPRR_ACCDN is the type of landmark.
- 1 means the intersection of two roads,
- 2 " means “other landmark”
0 means the type is not specified.
- “ACCDN_PRES_DE” is the landmark that the type refers to. It can be the road that intersects the road named under “RUE_ACCDN”, a bridge, a school name, etc.
- “NB_METRE_DIST_ACCD” is the distance in meters between the landmark and the accident.
“CD_PNT_CDRNL_REPRR” is the direction (North,South, East, Ouest) from the landmark to the accident.
Since most crashes involving pedestrians and bikes are located in the cities, the data typically contains the street civic number + street name or the name of the two streets at the road crossing. I didnt try to geocode the more complicated cases involving the milestone number.
Before creating the string that would be passed to the google maps API, I first had to replace a lot of abbreviations using regular expressions. For example, “BD” is actually “Boulevard” and “ST” usually stands for “Saint”. The regular expression tool of choice was which is the boundary of a word.
Geocoding using ggmap
I used the ggmap
package to geocode the car crashes through the google maps API.
The free version of the API is limited to 2 500 calls per day, so I had to get a
premium API key. This project didnt cost me anything since I had some google credit
from I dont know when.
11639 crashes involving bikes occurred in province de Québec between 2011 and 2016.
I didnt attempt to geocode 213 of them because I wasnt able to generate a satisfying “location” string. These crashes are listed in the appendix.
The API couldnt return a latitude/longitude for 392 of the 11426 that I tried to geocode. These are also listed in the appendix.
Exploratory data analysis (pre-geocoding)
I generate some tables and graphs here before moving toward our goal of listing the locations with the most crashes. The goal is the make sure that the data is sane and maybe to generate additional questions for future projects.
Crashes by year
Crashes by month of the year
Crashes by time of the day
Crashes by weather conditions
Crashes by administrative area
Crashes by city (top 10)
Crashes by seriousness
Résults - geocoded data
The rest of the analysis only involves the crashes that were successfully geocoded.
Top 15 locations with the most crashes
The table below shoes the top 15 of the locations with the most crashes involving bikes in province de Québec between 2011 and 2016.
Résults - leaflets
Map of deadly crashes
This map shows all the 66deadly crashes involving bikesin province de Québec between 2011-2016 that were successfully geocoded. If multiple crashes occurred at the same spot then we will only see the most recent one.
Heatmap
The following heatmap allows us to easily dangerous areas even if the crases didnt occur at the exact coordinnates.
MarkerCluster
This last map shows clusters of accidents. If you zoom to the maximum, you will be able to see the details of all crashes that occurrend at the intersection of 3e avenue and 4e rue in Québec city, but also the crash that occurred right next to it at “410 3e avenue”.
Ideas
The City of Montreal is the largest city in the province. They have released three datasets that I believe could be very useful for pushing this analysis forward.
The counts of bike travelling on bike paths, the location of the counters, a shapefile of the bike paths and telemetry data are all availables.
It is extremely useful to know at which spot there are the most crashes because this is why the city should work to improve the safety of the users first.
It would also be very interesting to know at which spots the ratio of crashes per trip is high.
Low traffic spots with higher crashes count indicate a dangerous road configuration that shouldnt be replicated.
That’s it folks!