Amanda Hickman
https://etherpad.opennews.org/p/maps_and_charts
As long as you email me before the course wraps up, I’ll be happy to review your homework. amanda@velociraptor.info
In New York, the Neighborhood You’re Shot in May Determine Whether You Survive, The Trace 17 October 2019; Maya Miller’s twitter thread walks through their process in establishing the story.
New Jersey’s Map of the Stars is a celebrity tour of the Garden State. NJ.com 9 October 2019
CalMatters mapped School Closures around the state.
The Age That Women Have Babies: How a Gap Divides America, New York Times 4 August 2018.
New York Times went even deeper on the data we were playing with in March: they found and animated the spread over each year. They also mapped the average age for first time mothers for every county in the US, and broke first time mothers into a few categories:
Maps are a powerful reporting tool, and they can be a powerful storytelling tool. We’re just going to dabble at the edges today, but there are some great resources under “Keep learning” if you want to do more mapping.
ProPublica, Reveal and The Texas Tribune collaborated to map a series of stories about Hurricane Harvey’s impact in Houston. Credit: Al Shaw and Lisa Song, ProPublica; Kiah Collier, The Texas Tribune; Neena Satija, The Texas Tribune and Reveal, January 3, 2018.
We use maps to understand the data, to find a story, to tell a story. Some of my favorite examples:
Mago Torres and her colleagues spent over a year compiling data on mass graves across Mexico while reporting on mass graves in Mexico. Their project A dónde van los desaparecidos is available in English on The Intercept – 2,000 Clandestine Graves: How a Decade of the Drug War Turned Mexico Into a Burial Ground.
ProPublica’s Guide to Every Permitted Natural Gas Well in West Virginia powerfully illustrates the impact of a massive expansion in horizontal well permits.
Capital Public Radio’s history of California wildfires looks at a question that ought to be on everyone’s mind after two years in a row of devastating fall wildfires. They wrote about how they did it on Source. A few more fire maps:
Something in the water, King’s College School of Journalism (2017) started as a mapping exercise and grew from there.
National Geographic’s interactive map of bird migrations is stunning. They wrote about how they built it, too.
FT mapped broadband speeds in Britain and wrote about how they built it
Sharon Lerner and Moiz Syed collaborated on a project looking at drinking water contamination from firefighting foam.
USA Today mapped maternal mortality across the US.
The Seattle Times put together a nice example of a Locator Map in their reporting on Swedish Health.
Points:
“Geocoding” refers to the process of identifying an individual latitude/longitude pair for an address or other location description. To actually plot a location on a map, you need the location’s latitude and longitude. 906 Washington
means nothing without coordinates.
Geocoding is often challenging because there aren’t great free resources for doing batch jobs or processing many addresses at once. The Geocoding Tip Sheet is a round up of good options, but often public data sources already include coordinates.
Lines: A line is a series of two or more points connected together – you learned that in Geometry. Think bus routes and flight paths.
Polygons: Counties, council districts, police precincts – these are all polygons. But not zipcodes. Zipcodes aren’t shapes.
Y’all. ZIP codes are not defined areas. Addresses have ZIP codes. The definition of a “ZIP code” is a list of addresses. And there are lots of different ways to take a bunch of points and turn them into a polygon. –Lyzi “Bonecrusher” Diamond
Geographic polygons are usually defined in one of two specialized file formats – a “Shapefile” or a “KML” file. The syntax of the file types varies, but they contain basically the same information – a series of lat/lon pairs that connect to form the shape.
If you have high school graduation rates by school district, and you want to map those, you need to find a shapefile that describes the outline of each school district, and then you need to combine that shapefile with your data, by identifying a column that the two tables have in common.
The Shapefiles Tip Sheet has some excellent resources for finding shapefiles.
This straightforward, if disturbing, BuzzFeed piece on a CDC report on rising suicide rates nationwide includes a map from the CDC. It isn’t a terrible map, but there are a few ways it could be much better.
Can you tell at a glance which states stand out as having the most severe increase?
Take a look at the legend. The sizes on those buckets are wild. The darkest has a 20 percentage pt spread, and the next has just a six percentage pt spread. These are quantiles: the CDC designed the buckets so that each would have ~12 states in it. And then they chipped off Nevada which is the only state that saw a decrease.
I should be able to explain why epidemiologists prefer quantiles to equal intervals, but I haven’t dug in deep enough to be able to explain it. I did find what appears at a glance to be a scientific paper arguing against the practice.
BuzzFeed News helpfully links to the original report – the raw data is available from the CDC in the Morbidity and Mortality Weekly Report.
» Question: is this data organized into points, lines or shapes?
To avoid hiccups in the copy and paste process, I went ahead and pulled the numbers for you. https://app.workbenchdata.com/workflows/5852
Workbench bills itself as a “data journalism platform” which seems a bit excessive, but it is a great way to share data, especially if you did some work to transform it from its source.
Step 0: Log into Datawrapper and choose “Create a map”. We want a choropleth. Fun thing I finally looked up: “choropleth” comes from the Greek words chóros
(area or region) and plíthos
(multitude).
Step 1: Search for “USA States” under “What type of map do you want to create?”.
Question that came up last time I taught mapping with Datawrapper: Why is the electoral college hex map different from the population hex map? It turns out there are a few reasons. First, the census updates population estimates more often than the electoral college is redesigned. Second, the actual electors aren’t based entirely on population. Each state gets one Elector for each senator (2) and one for each congress person (varies, maps are redrawn every 10 years). DC always gets 3 Electors (or, no more than the least populous state). 🏅 if you already knew that.
Step 3: Download your data from Workbench and add it to Datawrapper.
» Question: do we have ISO-Codes or Names in our data?
Once you’ve uploaded your data, read through the next screen, too.As you step through these dialog windows, they should make sense.
You’ve already got a much cleaner map. And a single state that really stands out. The Washington Post wrote more about why suicides have risen so much. They note how much ND stands out, but they don’t have an explanation either.
Step 4: Customize your gradient and your tooltips. The average nationwide was a 25.4% increase. You could reasonably center your buckets there. Or you can keep the default gradient. Add tooltips.
{{ Increase_Decrease }} of {{ Overall_Percent_Change }}%
Play with the colors. ProPublica recommends ColorOracle to find web safe colors and test for color blindness. ColorOracle takes some setup so for now use Color Brewer.
Step 5: Add a title and description. Fill in the rest of the metadata.
» Title: What is the takeaway here? In the BuzzFeed story, they captioned this “Suicide rates increased in almost all states between 1999 and 2016 — some by more than 30%.”, but the chart uses “Figure. Percent change in annual suicide rate,* by state– United States , from 1999/2001 to 2014/2016” – in reporting and story telling, your chart title should tell readers what they’re looking for in the chart.
» Caption: Tell your readers more about what we’re looking at. “A recent CDC report found that there is just one state in the union – Nevada – where suicides did not rise between 1999 and 2016.”
Step 6: Embed it!
» Question: – If these rates are both based on the 2000 population, how much does this map just reflect population growth? It’s a good question. According to the Census Nevada was the US state with the the highest growth between 2000 and 2010, but suicides there dropped.
We looked at a chart from this story on Rising deaths from cocaine and methamphetamine use on Tuesday. I wanted to look at a map the CDC published from the data, too.
Look at their map of drug overdose deaths:
Let’s talk it through and try to recreate it.
The map shows all overdoses. So we need to dial in on the data a bit. We want to refine this data to include only overdose deaths for each state for the 12 month period ending December 2018.
We have to distill the data by a ton and we have to normalize it to the population. My walk through is at https://app.workbenchdata.com/workflows/36826/
We could take all those steps, and it’s probably worth doing, but to map it we’re going to need the CSV: https://app.workbenchdata.com/public/moduledata/live/185629.csv
Note: I spent a decent amount of time in this data but I don’t know it inside and out and I definitely have some questions. Before you do any reporting on this you want to make sure you actually understand what the age adjusted rates mean, and that I’m not getting this all wrong.
The Washington Post collected data on more than 52,000 criminal homicides over the past decade in 50 of the largest American cities. I filtered out two local cities so we could take a closer look. Homicides in Oakland \ Homicides in San Francisco. We could map these in Datawrapper but we’re going to get frustrated with their built in maps.
Unfortunately, we don’t have good options right now for a very local map of points. I loved Carto for this but their free tier is way too limited to do anything with.
Datawrapper’s locator maps are fantastic if you’ve got fewer than 100 points to map.
Mapbox is a much more powerful option, but it’s also much more complex. We can map these as points, but adding hover windows is going to take a bit of javascript.
If you’re willing to learn (or cut and paste) some javascript, Mapbox Studio is pretty great. Their order of operations is kind of nuanced and not obvious or intuitive if you’re not familiar with some core principles of publishing maps on the internet. Their sample workflow is a good starting point, but it won’t be 100% clear until you’ve spent some time working with their tools.
Styles are the visual rules that control how your map is drawn on the page. Tilesets are mapbox’s primary data format. A tileset is a collection of images broken into a uniform grid of tiles, ready to load at various zoom levels. (If you’ve ever zoomed too fast on a Google Map you’ve seen tiles in action.) Datasets are the editable feature collections that tilesets are built from. A dataset is your collection of lines, points, or shapes, with descriptive data attached.
You can very quickly start looking at the data on a map. We can also go back to the Studio menu and start to work on making styles. Mapbox likes to start in Paris. If you aren’t making a map of Paris search for a different city so you can center your map there.
Style
and Select Data
and look for “Conditions”My rough finished map. To add any popover windows we need to get into Javascript.
They have a great tutorial on adding hover interactivity to maps.
We’ll play with this together until we have points on a map. And you can embed that map and make it zoomable. To add any interactivity, however, you have to start with some of their javascript tutorials.
optional: If you plan to include a map in your portfolio project, send me a pitch by next week – let me know what data you plan to map, where you found it and how you think it moves your story forward. If I get a thoughtful pitch by April 10, I’ll work with you to get your map into shape.
I’m amanda@velociraptor.info if you have any questions or want guidance or advice.
Source’s guide to Better Mapping is a fantastic round up of articles.
Mapbox is powerful, but not nearly as cut and paste as Datawrapper. Lo Benichou has written some fantastic Mapbox tutorials.
QGIS plays nicely with PostGIS and in combination the two can be a powerful geographic analytical tool. QGIS doesn’t output interactive maps but you can run much more complex queries like measuring distance or identifying all the points inside of a given shape. QGIS will generate SVG files that you can style in a vector graphics editor (eg. Inkscape, Illustrator) for publication. This is what I use.
These Dutch election maps were made in ggplot2 (an R package), working from a comprehensive tutorial tutorial. You can definitely do powerful geographic analysis inside of R.
Andrew B Tran publishes his NICAR workshops on Github including his session on Mapping with R
You can create interactive maps in D3. Mike Bostock has a solid tutorial available.