Notes on Using Refine
Once you’ve installed OpenRefine, you may see an error like “… is damaged and can’t be opened. You should move it to the Trash,” the solution to is not at all intuitive: You have to Change your Privacy and Security settings to allow applications downloaded from “Anywhere” – it’s on the “general” tab.
Refine runs a little server that you can access in your browser. It will always be at http://127.0.0.1:3333 once it is running.
When you upload a file, you have to click “Create Project” in the top right to get started. You have a bunch of “parsing” options about how Refine should treat data. I haven’t really plumbed those, but that should not stop you from doing so.
The state publishes a complete list of all 6844 Statewide Public Officials Download the data and pivot to get a count by “Office Title”. You can see very quickly that there’s kind of a problem.
So in refine, we’ll clean it up.
- Use the pull down next to
Office Titleand select
Facet > Text Facet
- Find the cluster button. Review,
Merge Selected and Re-Cluster.
- Why aren’t there new matches?
- Look under
Edit Cells > Common transformsand think about what “Trim whitespace” is likely to do.
- Cluster again.
- Try a new “Method” – more matches, some of which aren’t real overlaps.
- Re cluster for “City” and “City2”
- Open a
Custom numeric facet...for the “Commissioned Date” and see what
- Try splitting the zip code.
Check out the faceting documentation for more insights.
Take a look at the undo/redo history – you can create scripts to use over and over.
If you want more practice with refine, you might find one of these tutorials helpful: