Exploring US Wine Price and Quality Across Regions
The visualization explores the relationship between wine price and quality across New York, Washington, Oregon, and California.
Data Description
The dataset "wines.json" contains a subsample of wine reviews from four US wine regions: New York, Washington, Oregon, and California. Each data point includes information on the price of the wine and its quality score on a scale from 0 to 100.

Data Cleaning
There is a total of 64 bad data. In addition to the funny fake names, some locations like Mars are also spotted as a wine region in the dataset. There are also data that has a wine score beyond 0-100, which is out of scope of the research question that is investigating the relationship between the price and the 0-100 score. To filter the data, I first made sure that all wine regions used to sample the data set are in New York, Washington, Oregon, or California. I also made sure that all points that the wine score is between 0-100. After creating a basic plot, I noticed that there are also some outliers. While most wine has a price under 300, there are some data points that are priced over 300. I also filtered out these outliers. There was a total of 909 data to start off of. Upon applying the filters, there is now a total of 845 data.
Variables in the Dataset
Variable Name
Description
'title'
The name of the wine.
'variety'
The grape variety used to make the wine.
'winery'
The name of the winery producing the wine.
'region_1'
The specific wine region where the grapes were grown.
'country'
The country where the wine was produced.
'points'
The quality score of the wine on a scale from 0 to 100.
'price'
The price of the wine per bottle.
'taster_name'
The name of the taster who reviewed the wine.
'state'
The state within the country where the wine was produced.
Design Rationale
To visualize the relationship between price and quality as well as comparing wine regions, I used scatterplot with a legend.

Log Scale
After first using the linear scale, I noticed that many data points are cluttered on the left where the prices are lower. To help with a clearer view, I decided to use a log scale, which made data values more distinguishable. By doing so, the information is made more visible and easier to analyze.
jitter()
In addition to adopting the log scale, I also implemented a jitter function to add random offsets to the position of circles to reduce overlap.


Colour Scheme
I used state colours of the states New York, Washington, Oregon, and California. All four colours are visually distinguishable from each other. I also lowered the opacity for each circle, which should allow all datapoints to be more identifiable, especially when there are clusters of data.
d3.on("mouseover")
When users hover their mouse over data points, they can instantly view the title of the corresponding wine. Simultaneously, the hovered point enlarges, providing a more engaging and intuitive experience.
To keep the datapoints identifiable but also interactive, I made the radius small enough that they aren't all overlapped while making sure that they are big enough to be hovered on. I also made sure that the font size is big enough that it's easy to read. With this approach, one more layer of information can be communicated to the user.

Legend
I also integrated D3's on(mouseover) event to the legend. When users hover over a state in the legend, the associated data points on the graph will be highlighted with increased opacity, making them more prominent. Additionally, the opacity of other data points decreases, providing clearer focus on the selected state's data.
The Objective
The research question addressed by this data visualization is twofold:
Relationship between Price and Quality
The visualization aims to explore whether there exists a relationship between the price of wines and their quality scores on a 0-100 scale. It demonstrates a generally positive correlation between the price and quality of wines.
Comparison of Wine Regions
the visualization seeks to compare different wine regions (New York, Washington, Oregon, and California) in terms of wine quality and price. It reveals that wines from New York state tend to be the least expensive and have lower quality scores. In contrast, wines from California exhibit higher quality scores and are priced comparatively higher. This comparison provides insights into the relative quality and affordability of wines across various regions.
