STAT 29000: Project 6 — Spring 2022
Motivation: Being able to analyze and create good visualizations is a skill that is invaluable in many fields. It can be pretty fun too! In this project, we are going to dive into plotting using matplotlib
with an open project.
Context: We’ve been working hard all semester and learning a lot about web scraping. In this project we are going to ask you to examine some plots, write a little bit, and use your creative energies to create good visualizations about the flight data using the go-to plotting library for many, matplotlib
. In the next project, we will continue to learn about and become comfortable using matplotlib
.
Scope: Python, matplotlib, visualizations
Dataset(s)
The following questions will use the following dataset(s):
-
/depot/datamine/data/flights/subset/*.csv
Questions
Question 1
When submitting your .ipynb file for this project, if the .ipynb file doesn’t render in Gradescope, please export the notebook as a PDF and submit that as well — you will be helping the graders a lot! |
Here is the description of the 2009 Data Expo poster competition. The object of the competition was to visualize interesting information from the flights dataset.
The winner of the competition were:
-
First place: Congestion in the sky: Visualising domestic airline traffic with SAS (pdf, 550k) Rick Wicklin and Robert Allison, SAS Institute.
-
Second place: Delayed, Cancelled, On-Time, Boarding … Flying in the USA (pdf, 13 meg) Heike Hofmann, Di Cook, Chris Kielion, Barret Schloerke, Jon Hobbs, Adam Loy, Lawrence Mosley, David Rockoff, Yuanyuan Huang, Danielle Wrolstad and Tengfei Yin, Iowa State University.
-
Third place: A tale of two airports: An exploration of flight traffic at SFO and OAK. (pdf, 770k) Charlotte Wickham, UC Berkeley.
-
Honourable mention: Minimizing the Probability of Experiencing a Flight Delay (pdf, 7 meg) Tanujit Dey, David Phillips and Patrick Steele, College of William & Mary.
The other posters were:
-
The Airline Data Set… What’s the big deal? (pdf, 80k) Michael Kane and Jay Emerson, Yale.
-
Make a Smart Choice on Booking Your Flight! (pdf, 2 meg) Yu-Hsiang Sun, Case Western Reserve University.
-
Airline Data for Raleigh-Durham International Michael T. Crotty, SAS Institute Inc.
-
What Airlines Would You Avoid for Your Next Flight? Haolai Jiang and Jung-Chao Wang, Western Michigan University.
Examine all 8 posters and write a single sentence for each poster with your first impression(s). An example of an impression that will not get full credit would be: "My first impression is that this poster is bad and doesn’t look organized.". An example of an impression that will get full credit would be: "My first impression is that the author had a good visualization-to-text ratio and it seems easy to follow along.".
-
8 bullets, each containing a sentence with the first impression of the 8 visualizations. Order should be "first place", to "honourable mention", followed by "other posters" in the given order. Or, label which graphic each sentence is about.
Question 2
Creating More Effective Graphs by Dr. Naomi Robbins and The Elements of Graphing Data by Dr. William Cleveland at Purdue University, are two excellent books about data visualization. Read the following excerpts from the books (respectively), and list 2 things you learned, or found interesting from each book.
-
Two bullets for each book with items you learned or found interesting.
Question 3
Of the 7 posters with at least 3 plots and/or maps, choose 1 poster that you think you could improve upon or "out plot". Create 4 plots/maps that either:
-
Improve upon a plot from the poster you chose, or
-
Show a completely different plot that does a good job of getting an idea or observation across, or
-
Ruin a plot. Purposefully break the best practices you’ve learned about in order to make the visualization misleading. (limited to 1 of the 4 plots)
For each plot/map where you choose to do (1), include 1-2 sentences explaining what exactly you improved upon and how. Point out some of the best practices from the 2 provided texts that you followed.
For each plot/map where you choose to do (2), include 1-2 sentences explaining your graphic and outlining the best practices from the 2 texts that you followed.
For each plot/map where you choose to do (3), include 1-2 sentences explaining what you changed, what principle it broke, and how it made the plot misleading or worse.
While we are not asking you to create a poster, please use Jupyter notebooks to keep your plots, code, and text nicely formatted and organized. The more like a story your project reads, the better. In this project, we are restricting you to use matplotlib
in Python. While there are many interesting plotting packages like plotly
and plotnine
, we really want you to take the time to dig into matplotlib
and learn as much as you can.
-
All associated Python code you used to wrangling the data and create your graphics.
-
4 plots (and the Python code to produce the plots).
-
1-2 sentences per plot explaining what exactly you improved upon, what best practices from the texts you used, and how. If it is a brand new visualization, describe and explain your graphic, outlining the best practices from the 2 texts that you followed. If it is the ruined plot you chose, explain what you changed, what principle it broke, and how it made the plot misleading or worse.
Question 4
Now that you’ve been exploring data visualization, copy, paste, and update your first impressions from question (1) with your updated impressions. Which impression changed the most, and why?
-
8 bullets with updated impressions (still just a sentence or two) from question (1).
-
A sentence explaining which impression changed the most and why.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connect ion, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |