The Age That Women Have Babies: How a Gap Divides America, New York Times 4 August 2018.
New York Times went even deeper on the data we were playing with last week: they found and animated the spread over each year. They also mapped the average age for first time mothers for every county in the US, and broke first time mothers into a few categories:
We use maps to understand the data, to find a story, to tell a story. Some of my favorite examples:
“Geocoding” refers to the process of identifying an individual latitude/longitude pair for an address or other location description. To actually plot a location on a map, you need the location’s latitude and longitude.
219 West 40th Street means nothing without coordinates.
Geocoding is often challenging because there aren’t great free resources for doing batch jobs or processing many addresses at once. The Geocoding Tip Sheet is a round up of good options, but often public data sources already include coordinates.
We use lines pretty rarely in intro maps, but a line is a series of two or more points connected together.
Zipcodes, council districts, police precincts – these are all polygons. Most of your maps will be in polygons. These polygons are defined in (usually) one of two specialized file formats – a “Shapefile” or a “KML” file. The syntax of the file types varies, but they contain basically the same information – the polygon called “Bronx CB 04” is defined by this series of lat/lon pairs.
Usually your data won’t include a shapefile. If you have high school graduation rates by school district, and you want to map those, you need to find a shapefile that describes the outline of each school district, and then you need to combine that shapefile with your data, by identifying a column that the two tables have in common.
The Shapefiles Tip Sheet has some excellent resources for finding shapefiles.
Caroline Kee covered a pretty straightforward, if disturbing, CDC report on rising suicide rates nationwide. The map she included came directly from the CDC. It isn’t a terrible map, but there are a few ways it could be much better. Can you tell at a glance which states stand out as having the most severe increase?
Take a look at the legend. The sizes on those buckets are wild. The darkest has a 20 percentage pt spread, and the next has just a six percentage pt spread. These are quantiles: the CDC designed the buckets so that each would have 12 states in it. And then they chipped off Nevada which is the only state that saw a decrease.
Luckily, BuzzFeed News actually links to the original report – the raw data is available in the CDC’s original report which appeared in the Morbidity and Mortality Weekly Report. To avoid hiccups in the copy and paste process, I went ahead and pulled the numbers for you. Question: is this data organized into points, lines or shapes?
Workbench is still a work in progress but one thing it does well is show the transformations I applied to the original data.
Step 0: Download the csv from Workbench.
Step 1: Log into Datawrapper and choose “Create a map”. We want a choropleth.
Step 2: (Datawrapper thinks of this as Step 1), search for “USA States” under “What type of map do you want to create?”.
Fun question that came up Week 1, that I couldn’t answer off the top of my head: Why is the electoral college hex map different from the population map? It turns out there are a few reasons. First, as I noted, the census updates population estimates more often than the electoral college is redesigned. Second, the actual electors aren’t based entirely on population. Each state gets one Elector for each senator (2) and one for each congress person (varies, maps are redrawn every 10 years). DC always gets 3 Electors (or, no more than the least populous state).
The Washington Post did some deeper reporting asking why North Dakota stands out so much. It’s worth reading if you’re interested in that question.
Step 3: Import your dataset. But get in the habit of reading pop up windows. Do we have ISO-Codes or Names here?
Once you’ve uploaded your data, read through the next screen, too.
As you step through these dialog windows, they should make sense!
You’ve already got a much cleaner map. But we’re going to hit
Proceed and make it better.
Step 4: Customize your gradient and your tooltips. The average nationwide was a 25.4% increase. You could reasonably center your buckets there. Or you can keep the default gradient. And make some tooltips.
Step 5: Add your title and description. Never skip the metadata.
Title: What is the takeaway here? In the story, they captioned this “Suicide rates increased in almost all states between 1999 and 2016 — some by more than 30%.”, but the chart uses “Figure. Percent change in annual suicide rate,* by state– United States , from 1999/2001 to 2014/2016”
Caption: Tell your readers more about what we’re looking at. “A recent CDC report found that there is just one state in the union – Nevada – where suicides did not rise between 1999 and 2016.”
Step 6: Embed it!
Fun Question that came up Week 3 – If these rates are both based on the 2000 population, how much does this map just reflect population growth? One answer, from the Census, is that between 2000 and 2010 Nevada was the US state with the the highest growth.
The Washington Post collected data on more than 52,000 criminal homicides over the past decade in 50 of the largest American cities. I filtered out two local cities so we could take a closer look. Homicides in Oakland \ Homicides in San Francisco. We could map these in Datawrapper but we’re going to get frustrated with their built in maps.
Question: Is this data points, lines, or shapes?
Step 1: Find the URL for the cleaned and filtered CSV in Workbench. Copy that.
Step 2: Create a new spreadsheet. Populate it with the
=IMPORTDATA() function. What does the help menu say about how to use
Step 3: Format the
reported_date column so it reads as dates. We have to do this in our spreadsheet before we get to Fusion Tables.
Step 4: Create a new column and calculate the number of days the case has been open with
=DAYS(TODAY(),D2) – stop and read what
=TODAY() do. What do they do?
Step 5: 🤔 What is wrong with this picture? (Hint: it’s in the
disposition column.) Not all of these homicides are “unsolved”. So we’re going to filter out only the open cases and apply our “days open” function to them.
Step 6: Create a new Fusion Table. Go to Google Drive and select New > More > Google Fusion Tables (you might have to connect Fusion Tables as an app).
Step 7: Which column contains our location? We actually need a “two column location” which could be more intuitive than it is.
Step 8: This data contains all homicides, open and closed. So let’s play with a few ways to handle that.
a. Make a new table with File > New Table
b. We can use
=UNIQUE(Sheet1!L2:L947) to get the exact values we need.
c. Fusion Tables actually recognizes 200 different map markers but we’re going to stick with
d. Create a new Fusion Table from your spreadsheet tab.
e. Head back to your original map of homicides and use File > Find Table to Merge With to merge them.
f. On your map, look at “Change feature styles…” and find the “Column” tab.
I wound up with a table that looks like this:
|Closed by arrest||small_green|
Step 9: Last step, Tools > Publish
Note: New Media Report includes a nice Google Map tutorial if you want to keep playing with Google Maps.
Fusion Tables makes it a little harder to merge data into a boundary file, but they do maintain a good collection of boundary files. We can select the US States data from their boundary files and merge it with our CDC data.
We’re going to have to walk through a few steps:
Step 1: Upload the csv of CDC data directly to a new fusion table.
Step 2: Open the USA State Boundaries shape file.
Step 3: Use “Find table to merge with” to find your CDC table.
Step 4: Merge – remember we need to make sure we’ve got an apples to apples merge.
Styles are the visual rules that control how your map is drawn on the page. Tilesets are mapbox’s primary data format. A tileset is a collection of images broken into a uniform grid of tiles, ready to load at various zoom levels. (If you’ve ever zoomed too fast on a Google Map you’ve seen tiles in action.) Datasets are the editable feature collections that tilesets are built from. A dataset is your collection of lines, points, or shapes, with descriptive data attached.
You can very quickly start looking at the data on a map. We can also go back to the Studio menu and start to work on making styles. Mapbox likes to start in Paris. If you aren’t making a map of Paris search for a different city so you can center your map there.
Add a layer. Even though you already uploaded it, you want to select “upload” and then look for Create From Dataset.
Pick one of the data sets you identified and map it!
Source’s guide to Better Mapping is a fantastic round up of articles.