Homework Review

The Age That Women Have Babies: How a Gap Divides America, New York Times 4 August 2018.

New York Times went even deeper on the data we were playing with last week: they found and animated the spread over each year. They also mapped the average age for first time mothers for every county in the US, and broke first time mothers into a few categories:

NYT map of first time mothers by US county

NYT map of first time mothers by education and marriage

Why maps?

We use maps to understand the data, to find a story, to tell a story. Some of my favorite examples:

The Expanding News Desert, UNC (2018)
How we mapped Homan Square, Source + The Guardian (2018)
Something in the water, King’s College School of Journalism (2017)
Visualize Transit-Rich Housing, Sasha Aickin’s personal project (Spring 2018)
Murder with Impunity, Washington Post (2018)
Fatal Force, Washingon Post (2016)
How fast is LAFD where you live?, LA Times (2012)
Poisoned Place, NPR (2011) / map / about the data /
At Risk in a Big Quake: 39 of San Francisco’s Top High Rises, New York Times (2018)
San Francisco Public Press layered the map of un-retrofitted soft story buildings over the liquefaction map provided by the USGS to create a very rough list of unsafe buildings in San Francisco.
Mapping the Shoreline Building Boom as Seas Rise, San Francisco Public Press (2017)
ProPublica & The Lens: Losing Ground, ProPublica (2017)
Borderlands, NPR (2014)
CALmatters mapped all the housing initiatives on the June 2018 ballot, CALmatters (2018)
A quick caveat about normalization., XKCD

North Bay Fires (2017)

KQED Property and Structures Damage was the most visited during the fires.
KQED looks at fire hazard zones, which are public record. Most of the devastating 2017 Napa and Sonomoa fires were in “moderate” fire hazard zones.

Projections, Shapes, Points, and Lines

Mapping Points

“Geocoding” refers to the process of identifying an individual latitude/longitude pair for an address or other location description. To actually plot a location on a map, you need the location’s latitude and longitude. 219 West 40th Street means nothing without coordinates.

Geocoding is often challenging because there aren’t great free resources for doing batch jobs or processing many addresses at once. The Geocoding Tip Sheet is a round up of good options, but often public data sources already include coordinates.

Mapping Lines

We use lines pretty rarely in intro maps, but a line is a series of two or more points connected together.

Mapping Polygons

Zipcodes, council districts, police precincts – these are all polygons. Most of your maps will be in polygons. These polygons are defined in (usually) one of two specialized file formats – a “Shapefile” or a “KML” file. The syntax of the file types varies, but they contain basically the same information – the polygon called “Bronx CB 04” is defined by this series of lat/lon pairs.

Usually your data won’t include a shapefile. If you have high school graduation rates by school district, and you want to map those, you need to find a shapefile that describes the outline of each school district, and then you need to combine that shapefile with your data, by identifying a column that the two tables have in common.

The Shapefiles Tip Sheet has some excellent resources for finding shapefiles.

Making a map in Datawrapper

Caroline Kee covered a pretty straightforward, if disturbing, CDC report on rising suicide rates nationwide. The map she included came directly from the CDC. It isn’t a terrible map, but there are a few ways it could be much better. Can you tell at a glance which states stand out as having the most severe increase?

Take a look at the legend. The sizes on those buckets are wild. The darkest has a 20 percentage pt spread, and the next has just a six percentage pt spread. These are quantiles: the CDC designed the buckets so that each would have 12 states in it. And then they chipped off Nevada which is the only state that saw a decrease.

CCD Suicide Data

Luckily, BuzzFeed News actually links to the original report – the raw data is available in the CDC’s original report which appeared in the Morbidity and Mortality Weekly Report. To avoid hiccups in the copy and paste process, I went ahead and pulled the numbers for you. Question: is this data organized into points, lines or shapes?

https://app.workbenchdata.com/workflows/5852

Workbench is still a work in progress but one thing it does well is show the transformations I applied to the original data.

Step 0: Download the csv from Workbench.

Step 1: Log into Datawrapper and choose “Create a map”. We want a choropleth.

create a map

Step 2: (Datawrapper thinks of this as Step 1), search for “USA States” under “What type of map do you want to create?”.

Fun question that came up Week 1, that I couldn’t answer off the top of my head: Why is the electoral college hex map different from the population map? It turns out there are a few reasons. First, as I noted, the census updates population estimates more often than the electoral college is redesigned. Second, the actual electors aren’t based entirely on population. Each state gets one Elector for each senator (2) and one for each congress person (varies, maps are redrawn every 10 years). DC always gets 3 Electors (or, no more than the least populous state).

The Washington Post did some deeper reporting asking why North Dakota stands out so much. It’s worth reading if you’re interested in that question.

Step 3: Import your dataset. But get in the habit of reading pop up windows. Do we have ISO-Codes or Names here?

Once you’ve uploaded your data, read through the next screen, too.

pay attention

As you step through these dialog windows, they should make sense!

You’ve already got a much cleaner map. But we’re going to hit Proceed and make it better.

Step 4: Customize your gradient and your tooltips. The average nationwide was a 25.4% increase. You could reasonably center your buckets there. Or you can keep the default gradient. And make some tooltips.

 of %

Play with the colors. ProPublica recommends ColorOracle to find safe colors and test for color blindness. It takes some setup so for now we’re going to use Color Brewer becasue it’s fast.

Step 5: Add your title and description. Never skip the metadata.

Title: What is the takeaway here? In the story, they captioned this “Suicide rates increased in almost all states between 1999 and 2016 — some by more than 30%.”, but the chart uses “Figure. Percent change in annual suicide rate,* by state– United States , from 1999/2001 to 2014/2016”

Caption: Tell your readers more about what we’re looking at. “A recent CDC report found that there is just one state in the union – Nevada – where suicides did not rise between 1999 and 2016.”

Step 6: Embed it!

Fun Question that came up Week 3 – If these rates are both based on the 2000 population, how much does this map just reflect population growth? One answer, from the Census, is that between 2000 and 2010 Nevada was the US state with the the highest growth.

Making a Map in Fusion Tables

The Washington Post collected data on more than 52,000 criminal homicides over the past decade in 50 of the largest American cities. I filtered out two local cities so we could take a closer look. Homicides in Oakland \ Homicides in San Francisco. We could map these in Datawrapper but we’re going to get frustrated with their built in maps.

Question: Is this data points, lines, or shapes?

Step 1: Find the URL for the cleaned and filtered CSV in Workbench. Copy that.

Step 2: Create a new spreadsheet. Populate it with the =IMPORTDATA() function. What does the help menu say about how to use =IMPORTDATA()?

Step 3: Format the reported_date column so it reads as dates. We have to do this in our spreadsheet before we get to Fusion Tables.

Step 4: Create a new column and calculate the number of days the case has been open with =DAYS(TODAY(),D2) – stop and read what =DAYS() and =TODAY() do. What do they do?

Step 5: 🤔 What is wrong with this picture? (Hint: it’s in the disposition column.) Not all of these homicides are “unsolved”. So we’re going to filter out only the open cases and apply our “days open” function to them.

Step 6: Create a new Fusion Table. Go to Google Drive and select New > More > Google Fusion Tables (you might have to connect Fusion Tables as an app).

Step 7: Which column contains our location? We actually need a “two column location” which could be more intuitive than it is.

Step 8: This data contains all homicides, open and closed. So let’s play with a few ways to handle that.

Filter out the solved homicides.
Style the map by disposition – a process that should be easier than Google makes it.

a. Make a new table with File > New Table
b. We can use =UNIQUE(Sheet1!L2:L947) to get the exact values we need.
c. Fusion Tables actually recognizes 200 different map markers but we’re going to stick with small_red and small_green.
d. Create a new Fusion Table from your spreadsheet tab.
e. Head back to your original map of homicides and use File > Find Table to Merge With to merge them.
f. On your map, look at “Change feature styles…” and find the “Column” tab.

I wound up with a table that looks like this:

Disposition	icon name
Closed by arrest	small_green
Open/No arrest	small_red

Step 9: Last step, Tools > Publish

Note: New Media Report includes a nice Google Map tutorial if you want to keep playing with Google Maps.

Mapping Polygons in Fusion Tables

Fusion Tables makes it a little harder to merge data into a boundary file, but they do maintain a good collection of boundary files. We can select the US States data from their boundary files and merge it with our CDC data.

We’re going to have to walk through a few steps:

Step 1: Upload the csv of CDC data directly to a new fusion table.

Step 2: Open the USA State Boundaries shape file.

Step 3: Use “Find table to merge with” to find your CDC table.

Step 4: Merge – remember we need to make sure we’ve got an apples to apples merge.

Mapbox

Another excellent option if you’re willing to learn (or cut and paste) some javascript is Mapbox Studio. Their order of operations is kind of nuanced and not obvious or intuitive if you’re not familiar with some core principles of publishing maps on the internet. Their sample workflow is a good starting point, but it won’t be 100% clear until you’ve spent some time working with their tools.

Styles are the visual rules that control how your map is drawn on the page. Tilesets are mapbox’s primary data format. A tileset is a collection of images broken into a uniform grid of tiles, ready to load at various zoom levels. (If you’ve ever zoomed too fast on a Google Map you’ve seen tiles in action.) Datasets are the editable feature collections that tilesets are built from. A dataset is your collection of lines, points, or shapes, with descriptive data attached.

Download the unsolved homicides CSV that you’re interested in. Make sure you know where your computer stored it.
Make an account.
Head into Mapbox Studio once you’re logged in.
On the “Datasets” tab, click on “New dataset” – upload your csv.

You can very quickly start looking at the data on a map. We can also go back to the Studio menu and start to work on making styles. Mapbox likes to start in Paris. If you aren’t making a map of Paris search for a different city so you can center your map there.

Add a layer. Even though you already uploaded it, you want to select “upload” and then look for Create From Dataset.

We’ll play with this together until we have points on a map. And you can embed that map and make it zoomable. To add any interactivity, however, you have to start with some of their javascript tutorials.

Homework

Pick one of the data sets you identified and map it!

Resources

Source’s guide to Better Mapping is a fantastic round up of articles.
We use QGIS and PostGIS in the data investigations class because it is much more powerful analytical tool. The learning curve is a bit steep, however, and QGIS doesn’t produce interactive maps. QGIS will generate SVG files that you can style in Illustrator or any other vector graphics editor for publication.
R is powerful statistical software and not easy to learn. These Dutch election maps were made in ggplot2 (an R package) – working from a comprehensive tutorial tutorial. Like QGIS, R will generate SVG files that you can style in Adobe Illustrator or any other vector graphics editor.
You can make maps in D3, but not without getting code on your hands. Mike Bostock has a solid tutorial
For years, I taught students how to use Carto because it is easy to master and flexible. Unfortunately, they no longer offer a free tier to anyone but students, which means that if you’re just publishing your first one or two maps, you have to commit to a paid account. They don’t even publish the pricing for those accounts anymore. Sad trombone.
Tableau generates good maps. Peter Aldhous has a nice Tableau walkthrough from 2016. Unfortunately, I’ve never met a graphics editor who didn’t have some kind of tableau horror story.
Mapbox is powerful if you’re game to learn some javascript (or just to cut and paste). Lo Benichou has written some fantastic Mapbox tutorials. And if you don’t want or need interactivity, Mapbox Studio will let you design gorgeous map tiles, no javascript needed.

Workshops