In this project, we will geocode the crash data to identify the spots where the accidents involving bikes in province. This will allow us to determine in which areas an intervention to reduce the risk to active transportation would be most useful.

Data sources

Open data about the 2011-2016 car crashes reported to the police come from the province of Québec’s open data portal.
The data dictionary is also available on-line.

Packages used

Data wrangling is done using packages from the tidyverse. Geospatial data is geocoded using the ggmap package and treated using sf.
Data visualisations are done using ggplot2 and leaflet. If I remember correctly, the leaflet.extras package is used to the leaflet heatmap.
Color palettes for the maps are generated using the viridis package.
Data tables are displayed using the DT package.


The code that generated this document is located in https://github.com/SimonCoulombe/snippets/blob/master/content/post/2018-1-16-saaqmtq.Rmd

Define functions, downloads files

Clean data and prepare for geocoding

Here is a snapshot of the data as it was received

When preparing the car crash (“accidents”) data, we generate a factor variable of the seriousness (“gravité”) of the crash, from least serious to most serious.

French English
Dommages matériels seulement fender bender
Léger minor injuries
Grave major injuries
Mortel deadly

To convert the municipality codes to names, I created a tab-separated file from this tablea on the provincial government website.

The dataset contains multiple variables related to the crash location, but it doesnt include the latitude and longitude of the crash. We will need to create a string variable (I called it “location”) that will be passed to the google maps API so that it can return us a latitude and a longitude.

The geographical variables are as follow. They are never all filled.

  • “NO_CIVIQ_ACCDN” , the street civic number
  • “SFX_NO_CIVIQ_ACCDN”, a suffix to the street number
  • “RUE_ACCDN”, the road name
  • “CD_MUNCP”, the city code. Here is a dictionnary to convert city code to name.
  • “NO_ROUTE” is the road number where the accident happened (numbered roads are typically highways). This seems to be used as a alternative to the road name RUE_ACCDN.
  • “CD_PNT_CDRNL_ROUTE” is the direction (North, South, East, West (Ouest) ) travelled on the road/highway.
  • “BORNE_KM_ACCDN” is the milestone number (used on highways and northern gravel roads)

They also use landmarks (road crossings, etc..) to help locate the accident:

  • TP_REPRR_ACCDN is the type of landmark.
  • 1 means the intersection of two roads,
  • 2 " means “other landmark”
  • 0 means the type is not specified.

  • “ACCDN_PRES_DE” is the landmark that the type refers to. It can be the road that intersects the road named under “RUE_ACCDN”, a bridge, a school name, etc.
  • “NB_METRE_DIST_ACCD” is the distance in meters between the landmark and the accident.
  • “CD_PNT_CDRNL_REPRR” is the direction (North,South, East, Ouest) from the landmark to the accident.

Since most crashes involving pedestrians and bikes are located in the cities, the data typically contains the street civic number + street name or the name of the two streets at the road crossing. I didnt try to geocode the more complicated cases involving the milestone number.

Before creating the string that would be passed to the google maps API, I first had to replace a lot of abbreviations using regular expressions. For example, “BD” is actually “Boulevard” and “ST” usually stands for “Saint”. The regular expression tool of choice was which is the boundary of a word.

Geocoding using ggmap

I used the ggmappackage to geocode the car crashes through the google maps API. The free version of the API is limited to 2 500 calls per day, so I had to get a premium API key. This project didnt cost me anything since I had some google credit from I dont know when.

11639 crashes involving bikes occurred in province de Québec between 2011 and 2016.

I didnt attempt to geocode 213 of them because I wasnt able to generate a satisfying “location” string. These crashes are listed in the appendix.

The API couldnt return a latitude/longitude for 392 of the 11426 that I tried to geocode. These are also listed in the appendix.

Exploratory data analysis (pre-geocoding)

I generate some tables and graphs here before moving toward our goal of listing the locations with the most crashes. The goal is the make sure that the data is sane and maybe to generate additional questions for future projects.

Crashes by year

Crashes by month of the year

Crashes by time of the day

Crashes by weather conditions

Crashes by administrative area

Crashes by city (top 10)

Crashes by seriousness

Résults - geocoded data

The rest of the analysis only involves the crashes that were successfully geocoded.

Top 15 locations with the most crashes

The table below shoes the top 15 of the locations with the most crashes involving bikes in province de Québec between 2011 and 2016.

Résults - leaflets

Map of deadly crashes

This map shows all the 66deadly crashes involving bikesin province de Québec between 2011-2016 that were successfully geocoded. If multiple crashes occurred at the same spot then we will only see the most recent one.


The following heatmap allows us to easily dangerous areas even if the crases didnt occur at the exact coordinnates.


This last map shows clusters of accidents. If you zoom to the maximum, you will be able to see the details of all crashes that occurrend at the intersection of 3e avenue and 4e rue in Québec city, but also the crash that occurred right next to it at “410 3e avenue”.


The City of Montreal is the largest city in the province. They have released three datasets that I believe could be very useful for pushing this analysis forward.

The counts of bike travelling on bike paths, the location of the counters, a shapefile of the bike paths and telemetry data are all availables.

It is extremely useful to know at which spot there are the most crashes because this is why the city should work to improve the safety of the users first.
It would also be very interesting to know at which spots the ratio of crashes per trip is high. Low traffic spots with higher crashes count indicate a dangerous road configuration that shouldnt be replicated.

That’s it folks!