For visualizing some historical data, I've recently been toying around with Protovis and GeoCommons. In the past, I've played with TimeMap and other SIMILE widgets. All of these tools take input in structured text forms, but the format each tool wants is different.
For learning what these tools can do, I usually just hack the data together in a source file. With GeoCommons, I've been exporting CSV data from a Google Docs spreadsheet using URL-based queries. Figuring out the URL queries is a little complex, but I like the ease-of-use of the Google Docs UI. More and more, though, I keep coming up with little data sets that I'd rather not have to think about hardcoding into a particular JSON/CSV/etc format, so that I can get on with the intellectual work of my research.
As a historian, I should probably be keeping my data in more citationally-rigorous formats than JSON will support, but my datasets are still small and idiosyncratic enough that going to a full-scale database seems like overkill to me. So, I've got a few questions:
- When I'm doing experimental, exploratory visualization work with different tools, and when the structure of my data isn't always apparent at the beginning, how should I assess whether to it into a database first and then export views of that data out to my visualization tools?
- If I want to keep using Google Docs for simple data storage and querying but don't want to have to make my data sets public, what's the easiest-to-use library for interacting with their authorization API?
- Once I've settled on the best way to store various data sets, what tools/libraries can I use to transform it easily into the formats and data structures that different web services want to see? (Please, nothing having to do with XSLT, unless you've got pointers that'll make that learning curve flatter.)
I use OS X primarily, and I'm not afraid of working with the shell; my preferred languages are Python and Ruby, though obviously I'm having to do lots with Javascript too. I just hate debugging Javascript if I can avoid it.
(Edited to add: If anyone has bright ideas on good ways to preserve citational rigor in my data storage, that's important to me too. Lots of the data sets I'm creating are composited from facts found in particular manuscript items, and I need to be able to preserve the provenance of each data point. That could be as simple as an extra field with a Zotero citation code in it, but I can't lose sight of where the data comes from.)