Amanda Hickman

If you haven’t already created accounts on both Workbench and Datawrapper please go ahead and do that.

Finding and Cleaning Data

Good starting points for finding data:

Peter Aldhous has some fantastic tips for getting creative in his lesson plan from his Data Visualization course.

Mago Torres has a few great stories about the benefit of a little creativity if you’re working on an international story. If the US won’t give you details of a meeting between US and Mexican officials, don’t assume you can’t get what you need from the Mexican government.

It’s People!

"Data is like soylent green"

Quiz: what is the fastest way to reduce the number of murders in a single precinct?

Using Workbench to Clean Data

Peter Aldhous has a great walk through that will show you how to use OpenRefine to clean up data.

We’re going to use Workbench, only because I think it does a nice job of providing an easy to read audit trail. The two tools aren’t exactly interchangeable but knowing what is possible is the first step.

I replicated those data cleaning exercises so we can walk through them and explore Workbench. The first uses a CSV of data from the U.S. Energy Information Administration on regional oil production. The numbers given are thousands of barrels per day. (Specifically, I used their data browser to get “Total Petroleum and Other Liquids” from 2000 to 2017)

You can see my work at: but don’t use “duplicate” – we’re going to walk through this from scratch.

  1. Start by duplicating my workflow:

  2. Transform it into a functional spreadsheet using …
    • “Delete Rows” to delete rows 1-3 and 5-8
    • Edit the cell that is missing a header
    • “Rows to Header” to move the header where we want them
    • Rename the columns to drop those annoying decimals: delete me, region, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
    • Use the context menu on the first column, or “Select Columns” to delete it.
  3. Once we’ve got the data cleaned up, we can actually reshape it into something we might be able to chart.
    • Select “Reshape” and then “Transpose” to transpose the data
    • Use “Rename” to label the “year” column
    • “Convert to Date and Time”
    • “Line Chart”

Secretly, Workbench’s charting is a little wonky, so I’d move over to another charting tool if I wanted to see this visualized.

More Cleaning in Workbench

USA Spending tracks federal government spending. We can use their Advanced Search to pull up a list of grants to DUNS 9214214 and 124726725, aka Berkeley and Stanford. The data they provide is a lot cleaner than it was in 2014, but you can see an example of the kind of filtering you can do in my demo at

Note that this demo makes use of regular expressions which are awesome and also exasperating. Rubular and Regex101 might make your regular expression journey more sane.

Challenge: The state of California publishes data on who gave money to support or oppose individual ballot measures. Find and tidy the Prop 10 donors.

Making Really Good Charts

Source publishes a lot of great “How we built it” stories that ought to inspire you. Their Q & A with Alvin Chang on his quick-turnaround Vox graphic that showed unanswered questions from the Kavanaugh hearings is great.

animated step-by-step guide to improving bar charts from Dark Horse Analytics

Look at what makes NYT Charts work. I’m 100% open to design innovation but there are some pretty good established formulas that we can lean on:

Chart: The Declining Value of Low-Income Housing Tax Credits

These 95 Apartments Promised Affordable Rent in San Francisco. Then 6,580 People Applied. New York Times, May 12, 2018

Don’t Make Bad Charts

Are millenials really having fewer children? The answer is probably “duh, of course they are”. But looking at stories about the data there are a lot of examples of y-axis shenanigans:

Discussion: let’s talk about what data would get at that question, and go look at the data. We want two things: mean age at first birth, and birth rate. And we want to know whether millenials stand out or are just being consistent with long-standing trends.

Making A Simple Chart

Using the data at (which I pulled from the CDC report cited in most of the pieces above), let’s make a chart in Datawrapper.

  1. Find the URL to download the csv from workbench or copy the live link.
  2. Make a new chart in Datawrapper. Use “Link External Dataset” to provide the URL. We could probably use workbench to serve the data but Datawrapper is a safer choice.
  3. Power through step 2 (“Check and Describe”) but we’ll come back to it.
  4. Look at the raw chart.
  5. Back to Step 2 (“Check and Describe”) to select individual columns that we want to hide from the visualization.
  6. Refine, Annotate and Design it. We’ve got 3 different takes on this data to crib from. Set up a title, a description, any notes that you think are appropriate. When you’re happy with your work, publish it and share the URL, with your name, at

One More

ICE Chart Questions: What does this chart show? What is 38%? 52%? What do those represent? What does the grey represent?

Pew covered the same numbers but added context and specificity to their chart (and ommitted the 3D renderings). Pew Chart

I manually copied out the data from the ICE chart above. You can paste it straight into Datawrapper:

2016	2017	Percent Change
Total Arrests	30028	41318	37.6%
Convicted Criminals	25786	30473	18.2%
At-Large Arrests	8381	12786	52.6%

note: If copy and paste aren’t working for some reason, you can also grab the data in a CSV. Right click to grab the URL for the data, or download it and upload it to Datawrapper.

Spend some time in Datawrapper experimenting with how you want to display this data.

Here’s my version of those same numbers:

More Recent Examples

Charts to Discuss

Homework, Due Thursday

Read New bill would finally tear down federal judiciary’s ridiculous paywall Ars Technica, September 17, 2018

The article includes a chart that shows just how much PACER’s revenue has grown in the 30 years since it was founded. It’s a perfectly good chart but we can definitely apply some of the basic design guidelines we looked at in class to improve on it.

Ars Technica cribbed their chart from the Free Law Project, How Much Money Does PACER Make? Free Law Project, November 14, 2016 and the Free Law Project shared the source of their data. You can pull the xls url into Workbench or just copy and paste the data. If copy and paste is acting wonky, you can also use this csv. chart from arstechnica

Re-make or re-design the Ars Technica chart in Datawrapper:

Year	PACER Revenue (Millions)
1995	5
1996	7
1997	8
1998	11
1999	14
2000	14
2001	12
2002	18
2003	28
2004	38
2005	46
2006	62
2007	65
2008	78
2009	89
2010	103
2011	114
2012	124
2013	148
2014	145
2015	145

Post a link to your chart at:

For extra credit, look at the notes from the Free Law Project and see if you can find 2016 and 2017 figures. Pay attention not only to the data, but also to the framing. Give your chart a title, think about the labels you do and don’t want to include.

Identify at least three data sources that you’re interested in working with. One should be about places. Pull each into Workbench and make them Public. Share the URLs at:

Resources to Keep Learning

If you don’t have a copy of Sarah Cohen’s Numbers in the Newsroom get one. Read it through. She covers a lot of vital information about working with and writing about numbers.

If you don’t already feel comfortable with spreadsheets, you should work through the exercises in AMI’s Spreadsheet Basics. If you don’t have Safari or want additional refreshers, take a look at…

If you’re ready to do more with the data and charts…