I am looking for software that will display digital images files of handwritten material on the same screen as workspace for creating transcriptions of that material, preferably in a collaboration context.
collaborative software for transcribing digital images of handwritten documents
(14 posts) (12 voices)-
Posted 8 years ago Permalink
-
I think what you'll want is Scripto, currently being developed by the Center for History and New Media. I've written about a few other possibilities, but from what I can tell, Scripto is really going to be the best tool for the job. It's in early alpha stages at the moment.
Posted 8 years ago Permalink -
You may also want to look at what Islandora has done with TEI image transcriptions in Drupal.
Posted 8 years ago Permalink -
My own open-source project, FromThePage, has been used by a small group of volunteers to transcribe more than a thousand pages of early twentieth-century diaries over the last couple of years. Compared to the alternatives, I believe that it is particularly well suited to transcription projects that include annotation and indexing of free-form sources.
Although I plan a general release before the end of the year, I'd be happy to work with you to get something running before then. Indeed, the FromThePage beta site is ready for immediate use, if you're open to the idea of managing the project on hosted servers instead of running the software yourself.
Posted 8 years ago Permalink -
Although this project has not been launched, but is on the verge of it. What can be known about it, however is rather promising. So I'm thinking about T-PEN, and this is the link to their blog: http://digital-editor.blogspot.com/
Posted 8 years ago Permalink -
You might also want to check out Ben Brumfield's (above) Collaborative Manuscript Transcription blog: http://manuscripttranscription.blogspot.com/
You will find all kinds of goodies there.
Posted 8 years ago Permalink -
(Thanks, Alex!)
According to comments made recently on my review of the MediaWiki ProofreadPage extension for manuscript transcription, it looks like the Hebrew and German language Wikisource instances have launched some projects transcribing manuscript material. ProofreadPage is certainly worth looking at -- especially if your scanned images are already in djvu format.
Can you tell us a bit more about the kind of materials you're looking to transcribe? Are they structured (e.g. muster rolls, census forms) or free-form (e.g. letters, diaries)? How paleographically complex are they? Do you hope to produce annotated editions, simple (flat) transcriptions, or just indexes of people and places mentioned within the documents?
Posted 8 years ago Permalink -
So, does anyone know of something similar, but for audio files? It seems like the pile of oral histories to transcribe never gets any smaller. . .
Posted 8 years ago Permalink -
Replying to @Kaia's post:
Kaia, there seems to be a separate thread about this right now.
Posted 8 years ago Permalink -
Hi folks
You should also look at what we are doing over at Transcribe Bentham - we will be making the code public for that soon, for other people to use. http://www.transcribe-bentham.da.ulcc.ac.uk/td/Transcribe_BenthamThe tile project (http://mith.info/tile/) will also provide "a new web-based, modular, collaborative image markup tool for both manual and semi-automated linking between encoded text and image of text, and image annotation".
Posted 8 years ago Permalink -
TILE by itself wouldn't really help with this, but it's designed to be incorporated into other systems (VRE, for example) - and we're hopeful that once it's released this year it will actually be used in that way. And this may sound naive, but what about just using a wiki? Could you format the pages so that the images are on the left and you can transcribe on the right, or something? Otherwise, I'd second T-Pen and Transcribe Bentham (the advantage of TB being that it's actually going to be available soon).
Posted 8 years ago Permalink -
Sorry, to dredge up an old topic, but are any of the tools listed here capable of importing images from or otherwise interacting with DSpace? FromThePage appears to support some of the annotations that I would like to make about manuscript images, but I want to avoid making duplicates of images, if possible (since various derivatives will already be available in DSpace, should I elect to use it).
Also, is anyone aware of annotation tools that interact with web services, such as linking manuscripts to resources defined in VIAF or Geonames?
What sorts of export mechanisms are provided by these annotation tools? Can I export to TEI? MODS? Something else?
I envision using an annotation tool in the back-end to generate TEI or some other robust form of metadata with links to thesauri like VIAF, Geonames, or the Pleiades Gazetteer of Ancient Places in order to import a lot of open data for enhanced context, generate maps, etc.
Posted 5 years ago Permalink -
Replying to @Ethan Gruber's post:
These are excellent questions, Ethan.
Regarding image importing, at the moment the only transcription tools I know of supporting any sort of integration with external images are Scripto (Omeka, Wordpress, Drupal), FromThePage (Internet Archive), Islandora TEI Editor (Fedora), the (closed-source) BYU Historic Journals project (ContentDM), and Zooniverse Scribe (via deep-linking to any image on the internet, at a per-page level). This is, in my opinion, a serious and non-trivial problem with all transcription tools -- just last week I talked with a librarian who really wanted to use T-PEN for medieval manuscript fragments, but was going to use Scripto instead, purely because the images were on their Omeka site. It seems like rather than using the right tool for the job--and my own FromThePage would be just as inappropriate as Scripto for this use--she was using the tool which integrated with her CMS. The problem is that it's a lot of effort to integrate tool X with CMS Y, and it's not a problem that scales well across either dimension.
I'm not aware of any transcription tools that do linked data at all. I know that I've looked into it for FromThePage, but those investigations remain just that -- exciting ideas which do not override improvements to the core tool.
Regarding exports, the XML-(or TEI)-native T-PEN and Bentham Transcription Desk should support export of whatever TEI you use to put into them. Ditto for MOM-CA-based tools like Itinera Nova, Monasterium, Virtuelles deutsches Urkundennetzwerk as well as the Papyrological Editor. Mind you "should" is my own term -- I don't know if any of these tools actually feature an 'export' button or API. FromThePage converts transcripts, subjects, indices, and edit histories into one big HTML file for export, with classes which may allow extraction and conversion. (I explored TEI export in 2009, but at the time had no background in TEI and abandoned the project -- things will be different soon, however.) I gather that Scripto exports to the CMS which is its database-of-record, and suspect that CrowdCrafting may well have an export feature as well.
I love your idea of combining annotation and open data -- it seems to come up all the time in conversation these days, but I don't know of any projects which have gotten beyond the idea stage.
Posted 5 years ago Permalink -
Replying to @Ethan Gruber's post:
It's worth mentioning that Scripto itself is CMS-neutral (See this). Developers can build their own tools to make it talk with whatever CMS they want. So far, they've built the connectors for Omeka, WP, and Drupal, but more would be awesome.
I'm not sure about Scripto to record TEI, though, since the transcriptions are recorded in MediaWiki.
Posted 5 years ago Permalink
Reply
You must log in to post.