23 October 2014

Points, Shapes and Lines

Mapping Points

“Geocoding” refers to the process of identifying an individual latitude/longitude pair for an address or other location description. To actually plot a location on a map, you need the location’s latitude and longitude. 219 West 40th Street means nothing without coordinates.

Geocoding is actually challenging because there aren’t good, free resources for doing batch jobs, where many addresses are geocoded at once. My Geocoding Tip Sheet includes some helpful resources, but many city data sources actually include coordinates, so double check that, first.

If you’re committed to mapping points, you may need my help geocoding them.

Mapping Lines

No student has ever pitched me a compelling map that features lines rather than shapes or points. I did a project that drew out flight maps showing how far from home every prisoner incarcerated in Florence, CO is, but I pitched that, so it doesn’t count. To draw that map I had to take a crash course in rendering lines. If you’re excited about doing something like this, great! But you’re going to need to install R and walk through Nathan Yau’s tutorial before you do anything else. And that’s not where I’d start learning how to make a map.

Mapping Polygons

Zipcodes, council districts, police precincts – these are all polygons. Most of your maps will be in polygons. These polygons are defined in (usually) one of two specialized file formats – a “Shapefile” or a “KML” file. The syntax of the file types varies, but they contain basically the same information – the polygon called “Bronx CB 04” is defined by this series of lat/lon pairs. My Shapefiles Tip Sheet has some excellent resources for finding shapefiles, especially within New York City.

Often (usually) your data won’t include a shapefile. If you have High School graduation rates by school districts, and you want to map those, you need to find a shapefile that describes the outline of each school district, and then you need to combine that shapefile with your data, by identifying a column that the two tables have in common.

Walking Through It

BLS Map of workplace fatalities

The BLS published this gorgeous map of workplace fatalities a few years ago and I never get tired of using it in class.

Discussion: How could we improve on this map?

Hands On

Start with the BLS Data.


####Calculate fatalities per 100,000 residents? =(B1/C1)/100000

Find a US State Shapefile.

Upload both to CartoDB. What do they have in common?

Merge them. You can merge your fatality data into your us_states table:

UPDATE us_states SET bls_fatalities_per_100k = fatalities_per_capita.fatalities_per_capita
FROM fatalities_per_capita
WHERE us_states.name = fatalities_per_capita.state

Or you can merge state outlines into your BLS data table:

UPDATE bls_fatality_2011
SET the_geom = us_states.the_geom
FROM us_states
WHERE us_states.state = bls_fatality_2011.state

more notes

Advanced Queries

Where CartoDB really shines is when you plug into its underlying postGIS database. For instance … you can use ST_Union to merge two shapes (or polygons) into a single shape. For instance, if … hypothetically … you had county by county data on asthma rates in New York State, but only one line for “New York City” rather than data for each borough, and a shapefile with county outlines, you’d need to figure out how to create a single polygon of all five boroughs. Which you’d do about like this:

INSERT INTO ny_counties (county) VALUES ('New York City')

This INSERT query will add a new row to your ny_counties table, with “New York City” listed as value in the county column. INSERT will change your data by adding rows.

SELECT * FROM ny_counties WHERE county IN ('Kings', 'Queens', 'Richmond', 'Bronx', 'New York')

This SELECT query will not alter your data. It will select all the rows in a table called ny_counties where the county name (or the value in the county column) is in the list. This is a sanity check, it will let you see wether or not the next query is going to work.

UPDATE ny_counties SET the_geom = (
	SELECT ST_Union(the_geom) FROM ny_counties 
	WHERE county IN ('Kings', 'Queens', 'Richmond', 'Bronx', 'New York')
) WHERE county = 'New York City'

This UPDATE query will calculate a polygon that is the “Union” or combined area of all the polygons that represent matching counties, and add that polygon to the the_geom on the row that you added earlier, which has a county value of “New York City”. UPDATE will change your data by altering existing rows based on the WHERE clause. If you were to omit the WHERE clause, the query would change the_geom for every row.

That there are more efficient ways to do all of this, with nested queries, but I think you’re better off doing it in small steps, for now.

CUNY Graduate School of Journalism

© Fall 2014 Amanda Hickman