Discovering the Driving Forces Influencing Car Accidents in NYC
This project aimed to convey the correlation between different factors and car accidents, particularly exploring the hypothesis that weather conditions, such as precipitation, would significantly increase collision rates; however, upon incorporating weather and crash data, we discovered that other factors, such as driver behavior, are much better predictors for collision occurrence, ultimately highlighting the importance of responsible driving habits over external conditions.
Data Description
Our visualization combines two datasets: motor vehicle collisions for the State of New York and weather data from New York City. We also utilized a topojson dataset for the five boroughs of NYC. The motor vehicle collision dataset was downloaded from data.gov and the weather data came from a Kaggle dataset.
The Motor Vehicle Collisions crash table contains detailed information about each crash event and all reported collisions where there is injury, death, or significant property damage. The data is collected using the MV104-AN police report form. This data is preliminary and subject to change as reports are updated. For the most accurate and up-to-date statistics on traffic fatalities, the NYPD Motor Vehicle Collisions page or Vision Zero View should be referenced.

Data Cleaning
We had to conduct significant data filtering to make our data usable. The motor vehicle collision dataset included data for decades in all of New York. Our weather data set had less data relating to time. Hence, we filtered both datasets out to only include data for 2022. For example, it would not make sense to display collision data relating to weather without the weather data for Jan 7, 2007. The large vehicle collision dataset was also filtered based on longitude and latitude data corresponding only to the five boroughs of New York. This is because the weather data was only available for NYC. We could not draw any good deductions between the datasets if we compared an accident in Rochester to nonexistent weather data. Specifically, we filtered based on the following lat and long values (for New York’s five boroughs): min_lat, max_lat = 40.4774, 40.9176 and min_lon, max_lon = -74.2591, -73.7004.
The new dataframe was exported as a new csv file. The weather data was significantly easier to filter. We kept all data >= 2022 to match up with our collision data. For both datasets, we also filtered out any NaN values.
Variables in the Dataset
Variable Name
Description
'CRASH DATE'
The date of crash event.
'CRASH TIME'
The time of crash event
'ZIP CODE'
The zip code of where the crash happened.
'LATITUDE'
The latitude coordinate of the crash location.
'LONGITUDE'
The longitude coordinate of the crash location.
'NUMBER OF PERSONS INJURED'
The count of individuals injured in the crash.
'NUMBER OF PERSONS KILLED'
The count of individuals killed in the crash.
'NUMBER OF PEDESTRIANS INJURED'
The count of pedestrians injured in the crash.
'NUMBER OF PEDESTRIANS KILLED'
The count of pedestrians killed in the crash.
'COLLISION_ID'
A unique identifier for each collision event.
'NUMBER OF CYCLISTS INJURED'
The count of cyclists injured in the crash.
'NUMBER OF CYCLISTS KILLED'
The count of cyclists killed in the crash.
'NUMBER OF MOTORISTS INJURED'
The count of motorists (vehicle occupants) injured in the crash.
'NUMBER OF MOTORISTS KILLED'
The count of motorists (vehicle occupants) killed in the crash.
'CONTRIBUTING FACTOR VEHICLE 1'
The primary contributing factor to the crash for the first vehicle involved.
'CONTRIBUTING FACTOR VEHICLE 2'
The primary contributing factor to the crash for the second vehicle involved.
'VEHICLE TYPE CODE 1'
The type of vehicle involved in the crash for the first vehicle.
'VEHICLE TYPE CODE 2'
The type of vehicle involved in the crash for the second vehicle.
Design Rationale
NYC Crash Map
We included a variety of interactive elements in our design, including:
-
Mouseover and mouseout to interact with the map
-
Filterability based on date
-
โFilterability based on weather condition (rain)
As the user mouses over different parts of the map, we label which borough of New York they are currently looking at and highlight it in yellow. We did this to make sure that the visualization was as interpretable as possible. Even users who are not familiar with New York City should have a pleasant time interacting with our visualization.

We decided to make the labels mouseover and mouseout only because having the labels there all the time would make the visual cluttered and take away from what we actually trying to highlight: vehicle collision patterns. The mouseover functionality is not directly discoverable without interacting with the graph. However, we felt this was fine as the graph is at the core of our visualization, and the user will likely intuitively be moving over the site anyways, thus seeing the interaction once the mouse hovers over the large map.
Geographic Distribution
Each crash event is represented by a point on the map using latitude and longitude coordinates. This visualization allows viewers to quickly grasp the distribution and geographical patterns of crashes.
Position of Crash Points
The position of each point on the map serves as a key visual channel, which allows viewers to easily interpret the location of every crash and observe geographical clusters.
Colour of Map
We choose navy blue as the colour of our map and buttons as it is a well-suited neutral colour for map visualization and also a colour of NYC. It is pleasing to the eye and does not overwhelm the viewer. The buttons are styled with steel blue as the consistency of the colour using provides a visual connection between primary visualization and the interactive elements.
The user can further interact with the map by filtering for specific dates. Once dates that are within range are selected (dates that are out of range are not selectable in the filter) and the “filter” button is selected, the map will be populated with red dots signifying collisions. The inability to select certain dates is a clear affordance that this data is unavailable to the user. The date selection tool is very similar to the way this design typically looks across the web, this was done intentionally to make the interaction seem familiar to the user.
Furthermore, the user can decide to filter the collisions further based on if it rained or not when the collision occurred. As an example, let’s say the user is filtering on collisions from 01/01/2022 to 02/15/2022. Without the rain filter checked, there are a bunch of red collisions on the map. However, when filtering based on rain, there are significantly (visually speaking) less collisions visible. This check box provides an affordance to the user that it is checkable as a checkmark is either present or it is not, but the box looks like other checkable patterns across the web (as opposed to a circle, for example, which is more commonly used as one of many options). We also include text next to the button to describe what the filtering does: did it rain when the collision occurred?
Date Filter
We choose a straightforward date input filter to make sure that users can easily interact with the map and look up how many and where accidents happened. The filter will send a warning if the dates imputed are out of range. (The use of consistent steelblue borders maintains visual harmony.)
Rain Filter
The label “Check if rained?” provides explicit guidance and affordance, enhancing user understanding and interaction.
Colour of Crash
The use of red for the points is meaningful. Red is commonly associated with danger, caution, and emergencies, which aligns with the nature of the accidents/car crashes represented. Red also creates a contrast against the light blue background of the map. This deliberate choice was made to enhance the visibility of the data and make it vividly stand out on the screen.
Bar Chart
The bar chart present an organized view of contributing factors and number of crash caused. In this visualization, we also we added mouseover to the bar chart at the bottom to color the bars in red as the user hovers over them. This was done to create an implicit connection between factors and collisions. Notice we color them in the same red color. This feature can also only be found when actually interacting with the chart, similar to our reasoning for the map.

Colour Consistency
We choose navy blue as the colour of the bar chart to make it consistent with the theme colour. This will create a unified and harmonious visual experience.
Efficient Space Utilization
We provide a clear presentation of each factor by showing it at a 45-degree angle, ensuring optimal readability.
Sorting
The bars are sorted in descending order based on meteorite counts. Viewers can quickly identify major factors.
The Objective
With our visualization, we wanted to convey the story of correlation between different factors and car accidents. Our original hypothesis was that weather conditions (particularly bad weather, generally defined as precipitation > 0) would lead to a significant increase in car accidents.


This was the original justification for overlaying collisions on a map of New York’s five boroughs and including overlays of weather information as the user filtered by day. This effectively visualized where most accidents occur and the filtering allowed the viewer to explore different times of the year when precipitation would generally differ (more in the wintertime, less in the summertime). The heavily populated area (Manhattan) has significantly more accidents nominally during our time period than parts of New York like Staten Island. Our story was meant as a “cautionary tale” to inform the viewer of factors that can potentially be correlated with vehicle collisions (we made the clear distinction between correlation and causation). We did not want to leave the reader with the impression that they should be scared of driving in New York’s five boroughs because of the overwhelming number of car collisions. This is why we wanted to highlight reasons for collisions to aid the viewer in making better decisions both how they drive and when they drive.
To our surprise, when incorporating weather and crash data, we actually found that there are much better predictors for the number of collisions than weather. This is when we decided to include the bar chart of different causes as weather is not a particularly impressive indicator (something we found out by making the visualizations and analyzing the data as the project went on!).
The story ultimately conveys that the driver actually has plenty of power to limit their risks of getting in an accident. Don’t drive distracted, and follow traffic rules are clear takeaways from the bar chart. Perhaps this ends up telling a more positive story than we originally thought. You can’t control the weather, but you can control your own actions as a driver. Hence, the fact that weather is not as significant a predictor of collision as originally hypothesized leads to the conclusion that the driver does not need to worry about external factors as much as one may expect.