Lightweight data management/storage/transformation for use with web services?

Digital Humanities Questions & Answers » Applications, Tools, Formats

Lightweight data management/storage/transformation for use with web services?

(4 posts) (3 voices)

Asked 5 years ago by Shane Landrum
Latest answer from Shane Landrum

Tags:

Shane Landrum
Member
For visualizing some historical data, I've recently been toying around with Protovis and GeoCommons. In the past, I've played with TimeMap and other SIMILE widgets. All of these tools take input in structured text forms, but the format each tool wants is different.

For learning what these tools can do, I usually just hack the data together in a source file. With GeoCommons, I've been exporting CSV data from a Google Docs spreadsheet using URL-based queries. Figuring out the URL queries is a little complex, but I like the ease-of-use of the Google Docs UI. More and more, though, I keep coming up with little data sets that I'd rather not have to think about hardcoding into a particular JSON/CSV/etc format, so that I can get on with the intellectual work of my research.

As a historian, I should probably be keeping my data in more citationally-rigorous formats than JSON will support, but my datasets are still small and idiosyncratic enough that going to a full-scale database seems like overkill to me. So, I've got a few questions:
1. When I'm doing experimental, exploratory visualization work with different tools, and when the structure of my data isn't always apparent at the beginning, how should I assess whether to it into a database first and then export views of that data out to my visualization tools?
2. If I want to keep using Google Docs for simple data storage and querying but don't want to have to make my data sets public, what's the easiest-to-use library for interacting with their authorization API?
3. Once I've settled on the best way to store various data sets, what tools/libraries can I use to transform it easily into the formats and data structures that different web services want to see? (Please, nothing having to do with XSLT, unless you've got pointers that'll make that learning curve flatter.)
I use OS X primarily, and I'm not afraid of working with the shell; my preferred languages are Python and Ruby, though obviously I'm having to do lots with Javascript too. I just hate debugging Javascript if I can avoid it.

(Edited to add: If anyone has bright ideas on good ways to preserve citational rigor in my data storage, that's important to me too. Lots of the data sets I'm creating are composited from facts found in particular manuscript items, and I need to be able to preserve the provenance of each data point. That could be as simple as an extra field with a Zotero citation code in it, but I can't lose sight of where the data comes from.)
Tweet this question
Posted 5 years ago Permalink
Wayne Graham
Member

Taking these on completely out of order, there are pretty good libraries for Python and Ruby to access Google Docs. The 'official' ruby library is at http://code.google.com/apis/gdata/articles/using_ruby.html and essentially you pass an http header with your credentials. The Python client docs are over at http://code.google.com/apis/gdata/articles/python_client_lib.html. Both are pretty straight forward, especially if you're just using this for your own smallish project.

I would also stick with the CSV parsers that Ruby and Python both have, and just use string manipulations for massaging the data as you need. You can use the CSV parser built in to both Python and Ruby (though you may want to go with something like FasterCSV if you're in Ruby). These are implemented in C, so they're reasonably fast and should get you everything you need.

For you first question, I think it really depends on your needs. I typically suggest one datastore and shape the data to the format needed for the visualization tool as a view of the data, but I know some folks prefer the opposite. However, you'll still need to massage data if you drop one visualization library for another, so just make sure to write down good notes on how the data is preserved.

With your last question, I've done some preliminary work on a rubygem called acts-as-citable which allows you to attach a zotero id to a model. If you're working in spread sheets, I would suggest a column with the zotero id and use the rzotero gem (I believe there's a python package as well).

Posted 5 years ago Permalink
william.j.turkel@gmail.com
Member

The advantage of keeping your data sets in a lightweight, plain text format (like JSON or CSV) is that they remain human readable and usable for the foreseeable future. This is especially the case if you try to make the data self-describing (by using meaningful fieldnames, etc.) A SQL dump or XML are also human readable, but often less so. I don't put my stuff into a SQL database until I know that I need to do something relational with it and sometimes not even then.

Posted 5 years ago Permalink
Shane Landrum
Member

Replying to @william.j.turkel@gmail.com's post:

See, that's what I'm thinking. I like the lightweight text formats; JSON can lend itself to marginalia in the form of comments (citations, etc), which seems particularly handy to me. OTOH, I also find JSON a lot clunkier to edit, UI-wise. (Maybe I just haven't committed sufficiently to a particular editor.)

When you do need to do something relational with data and choose not to put it into a SQL database, what are the factors in that choice? What do you do instead?

Posted 5 years ago Permalink

RSS feed for this topic

Reply

You must log in to post.