Amanda Hickman
If you haven’t already created accounts on both Workbench and Datawrapper please go ahead and do that.
Good starting points for finding data:
Peter Aldhous has some fantastic tips for getting creative in his lesson plan from his Data Visualization course.
Mago Torres has a few great stories about the benefit of a little creativity if you’re working on an international story. If the US won’t give you details of a meeting between US and Mexican officials, don’t assume you can’t get what you need from the Mexican government.
Quiz: what is the fastest way to reduce the number of murders in a single precinct?
Peter Aldhous has a great walk through that will show you how to use OpenRefine to clean up data.
We’re going to use Workbench, only because I think it does a nice job of providing an easy to read audit trail. The two tools aren’t exactly interchangeable but knowing what is possible is the first step.
I replicated those data cleaning exercises so we can walk through them and explore Workbench. The first uses a CSV of data from the U.S. Energy Information Administration on regional oil production. The numbers given are thousands of barrels per day. (Specifically, I used their data browser to get “Total Petroleum and Other Liquids” from 2000 to 2017)
You can see my work at: https://app.workbenchdata.com/workflows/5908/ but don’t use “duplicate” – we’re going to walk through this from scratch.
Start by duplicating my workflow: https://app.workbenchdata.com/workflows/6630/
delete me, region, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
Secretly, Workbench’s charting is a little wonky, so I’d move over to another charting tool if I wanted to see this visualized.
USA Spending tracks federal government spending. We can use their Advanced Search to pull up a list of grants to DUNS 9214214 and 124726725, aka Berkeley and Stanford. The data they provide is a lot cleaner than it was in 2014, but you can see an example of the kind of filtering you can do in my demo at https://app.workbenchdata.com/workflows/5912/
Note that this demo makes use of regular expressions which are awesome and also exasperating. Rubular and Regex101 might make your regular expression journey more sane.
Challenge: The state of California publishes data on who gave money to support or oppose individual ballot measures. Find and tidy the Prop 10 donors.
Source publishes a lot of great “How we built it” stories that ought to inspire you. Their Q & A with Alvin Chang on his quick-turnaround Vox graphic that showed unanswered questions from the Kavanaugh hearings is great.
Look at what makes NYT Charts work. I’m 100% open to design innovation but there are some pretty good established formulas that we can lean on:
These 95 Apartments Promised Affordable Rent in San Francisco. Then 6,580 People Applied. New York Times, May 12, 2018
Are millenials really having fewer children? The answer is probably “duh, of course they are”. But looking at stories about the data there are a lot of examples of y-axis shenanigans:
Discussion: let’s talk about what data would get at that question, and go look at the data. We want two things: mean age at first birth, and birth rate. And we want to know whether millenials stand out or are just being consistent with long-standing trends.
Using the data at https://app.workbenchdata.com/workflows/6631/ (which I pulled from the CDC report cited in most of the pieces above), let’s make a chart in Datawrapper.
Questions: What does this chart show? What is 38%? 52%? What do those represent? What does the grey represent?
Pew covered the same numbers but added context and specificity to their chart (and ommitted the 3D renderings).
I manually copied out the data from the ICE chart above. You can paste it straight into Datawrapper:
2016 2017 Percent Change
Total Arrests 30028 41318 37.6%
Convicted Criminals 25786 30473 18.2%
At-Large Arrests 8381 12786 52.6%
note: If copy and paste aren’t working for some reason, you can also grab the data in a CSV. Right click to grab the URL for the data, or download it and upload it to Datawrapper.
Spend some time in Datawrapper experimenting with how you want to display this data.
Here’s my version of those same numbers:
Read New bill would finally tear down federal judiciary’s ridiculous paywall Ars Technica, September 17, 2018
The article includes a chart that shows just how much PACER’s revenue has grown in the 30 years since it was founded. It’s a perfectly good chart but we can definitely apply some of the basic design guidelines we looked at in class to improve on it.
Ars Technica cribbed their chart from the Free Law Project, How Much Money Does PACER Make? Free Law Project, November 14, 2016 and the Free Law Project shared the source of their data. You can pull the xls url into Workbench or just copy and paste the data. If copy and paste is acting wonky, you can also use this csv.
Re-make or re-design the Ars Technica chart in Datawrapper:
Year PACER Revenue (Millions)
1995 5
1996 7
1997 8
1998 11
1999 14
2000 14
2001 12
2002 18
2003 28
2004 38
2005 46
2006 62
2007 65
2008 78
2009 89
2010 103
2011 114
2012 124
2013 148
2014 145
2015 145
Post a link to your chart at: https://etherpad.opennews.org/p/maps_and_charts
For extra credit, look at the notes from the Free Law Project and see if you can find 2016 and 2017 figures. Pay attention not only to the data, but also to the framing. Give your chart a title, think about the labels you do and don’t want to include.
Identify at least three data sources that you’re interested in working with. One should be about places. Pull each into Workbench and make them Public. Share the URLs at: https://etherpad.opennews.org/p/maps_and_charts
If you don’t have a copy of Sarah Cohen’s Numbers in the Newsroom get one. Read it through. She covers a lot of vital information about working with and writing about numbers.
If you don’t already feel comfortable with spreadsheets, you should work through the exercises in AMI’s Spreadsheet Basics. If you don’t have Safari or want additional refreshers, take a look at…
If you’re ready to do more with the data and charts…