Verifying the Accuracy of Double Keying - any studies? « Digital Humanities Questions & Answers

Digital Humanities Questions & Answers » Applications, Tools, Formats

Verifying the Accuracy of Double Keying - any studies?

(7 posts) (4 voices)

Asked 5 years ago by lit_cht
Latest answer from lit_cht
This question has a best answer.

Tags:

lit_cht
Member
Dear colleagues,

as staff member of the Deutsches Textarchiv (www.deutschestextarchiv.de), I would be interested in studies on the accuracy of double keying, esp. in large full text, tei/xml annotated corpora. Service providers advertise accuracy rates of 99.+%, has anyone ever questioned this on an empirical basis? I have found various musings on OCR accuracy and on how to improve results here, but nothing similar on double keying. I would be grateful for any hints.

Thanks in advance,
all the best
Christian Thomas

(I also mailed this to Humanist, as here and there will not be exactly the same people reading, I hope. If so, please apologize cross-posting...)
Tweet this question
Posted 5 years ago Permalink
Dorothea Salo
Member

Well, if we're talking about TEI, there are at least two possible meanings of "accuracy" here: accuracy of rekeying of the (presumably) printed text, and accuracy of encoding.

99%+ rekeying accuracy via double-keying is almost certainly a reliable estimate for any competent operation; the medical literature (oddly enough!) has done some empirical studies on double-keying. Since encoding is judgment-dependent, though, accuracy there will depend heavily on how clear and detailed your instructions to the keyers are.

Back in the day when I worked for a publishing service bureau, we got excellent results from outsourced double-keying, including when markup was involved. Personally, I wouldn't stress too much over rekeying accuracy; focus on quality instructions for the TEI, and you should get decent results back.

Posted 5 years ago Permalink
PFSchaffner
Member
Best Answer
Replying to @lit_cht's post:
I am not aware of any studies of the accuracy of keyed text, but have a lot of experience assessing that accuracy over the past 14 years. Many of the digitization projects undertaken by the University of Michigan library over that period (especially the Middle English Dictionary and Compendium, and the nearly 50,000 books produced so far by the Text Creation Partnership (EEBO-TCP, Evans-TCP, and ECCO-TCP). were generated by outsourced re-keying, sampled and proofread in-house (usually at a 5% sampling rate), and evaluated against a nominal spec of 99.995% accuracy (i.e., less than one error per 20,000 bytes). Having stuck with this approach so long, clearly we find something to recommend in it.

A few points and conclusions:
- we have never seen any point to specifying a particular *method* (and indeed some of the commercial processes are proprietary secrets): we specify a given accuracy of output, and leave it to the vendors whether that is best achieved by single keying, double blind keying, keying-and-proofing, or the use of their own algorithms, dictionaries, etc.
- 'Accuracy' and 'accuracy rate' are surprisingly problematic notions for all but the simplest and cleanest texts. Any realistic definition has to take into account the huge problem of ambiguous glyphs and glyph-to-character mappings, even when the source is clean, the typeface modern, and the orthography consistent. When the source is 'dirty' (image quality degraded), the typeface(s) old and obscure (or when handwriting is involved), or the text orthographically odd, it becomes obvious how subjective a concept 'accuracy' really is. Transcription can be every bit as interpretive as markup, and cannot really be separated from it.
- Our procedure for the TCP texts is to generate (by script) a random sample of roughly 5% of the pages in any incoming text, convert the sample to HTML for ready display, and present it alongside page images of the source for comparison by our proofreaders. Because of the 'noisiness' of many of our books, the proofreaders are instructed to distinguish between 'forced' and 'unforced' errors (the former attributable to some defect or ambiguity in the source); to count only the 'unforced' errors against spec, and to request the resubmission of books that fail to meet spec. The error count is measured against the byte count of the sample, markup excluded. There are exceptions for particular circumstances, editors retain the power to pardon as well as to reject, and small (sub-20KB) books are usually bundled together for sampling and proofing as a batch.
- Slippery definitions aside, double keying (or equivalent processes) can produce extraordinarily accurate text. When the source is of high quality and consistency, it can approach perfection, with measured accuracy rates that are statistically indistinguishable from 100%. Many of our current TCP texts are at that level; so were the transcriptions done for the e-MED.
- When keyed text fails our tests, it is difficult to know, from outside the black box where we sit, whether it is because true double keying (or equivalent) was not used, because it was deployed improperly, or whether it was intrinsically not up to the task. And the factors that most obviously affect accuracy are very difficult to quantify: image quality, linguistic difficulty (including the presence of abbreviations and variant orthographies), format and layout, typography, and the expertise and experience of the individual keyer.
- For there is an irreducible human factor to keying that resists quantification and systematization. At the level of the individual keyer, as the source becomes murkier, the process becomes more one of reading and less one of glyph decipherment--and accurate reading (as a somewhat mysterious gestalt process) varies widely with the talent, linguistic expertise, experience and focus of the individual keyer. We have received transcriptions from some keyers that seem well-nigh miraculous in their ability to extract meaningful text from a blotted, bled-through, or faint source; but also transcriptions that clearly fell too readily for the source's snares and ambiguities, or lacked sufficient knowledge of the language and conventions of the source to interpret it correctly or recognize anomalies.
- At the corporate level, too, one has to be careful to create incentives that match the character of one's source material: corporations are people too. E.g., if the source is potentially difficult, be sure not to penalize good (or even bad) guesswork, or you will end up with a text full of "illegibile" flags rather than characters (conversion firms tending to prefer safe to sorry.) And I cannot agree with advice not to worry about accuracy or enforce your accuracy specification. Failure to test and verify is itself an incentive, one that will surely be taken advantage of.
- The obverse to the human factor is this: the very problems that make keying sometimes challenging can make OCR impossible. Keying is better than OCR even when the material is least problematic: as it becomes more so, keying pulls away and leaves OCR in the dust.
- We have raw proofreading numbers (byte counts, sample sizes, errors found) that we can probably extract for a decade's worth of keying. I'm not sure what you'd do with them, but do let me know if you want me to try to pull them for you.
Good luck!
Posted 5 years ago Permalink
Dorothea Salo
Member

I should probably clarify at this point (since PFSchaffner is absolutely right to question the "oh, just leave it to the vendor" vibe I was giving off) that at a publishing-services bureau, what's getting keyed is generally recently-printed books or well-edited manuscripts. Many fewer (though not quite zero) interpretation problems!

Once you're working with less clean materials, see PFSchaffner's response; the possibilities for error and "error" are indeed vast.

Posted 5 years ago Permalink
Wesley
Member

In response to your Humanist post, I can offer some insight. I double-keyed the text of a long novel for my dissertation. Though my purpose was not to assess the accuracy of double-keying but rather to achieve the most accurate textual record and to state clearly what my estimate of errors is. It is not truly a controlled test. I assume that one text was single-keyed. I keyed the text myself. I compared the two keying. And I proofread.

In a text of approximately 700,000 characters, I believe that around 200 characters were in error after double-keying. (99.999 percent accurate). By a system of planting errors, I was able to estimate that oral proofreading captured 80 percent of errors that remained.

When I checked another text (single-keyed, I’m quite certain) it was accurate at a rate of 97.5 percent.

The dissertation is posted here.

http://www3.iath.virginia.edu/wnr4c/Raabe.Era.UTC.Diss.pdf

See pages 37 to 38 for a discussion of the method for planting and estimating errors. And see page 63 for estimate of accuracy.

Posted 5 years ago Permalink
lit_cht
Member

Dear Collegues, thank you very much for your helpful insights, hints and suggestions. We will continue to run a study of our own with the various volumes double-keyed in the past 3 years for Deutsches Textarchiv (http://www.deutschestextarchiv.de/). Presentation in Wuerzburg, tei-mm (http://www.zde.uni-wuerzburg.de/veranstaltungen/tei_mm_2011/). Hope to see you there, best wishes,
Christian Thomas

Posted 5 years ago Permalink
lit_cht
Member

Replying to @lit_cht's post:
Dear Colleagues, though it has been some time since this discussion came to a preliminary end, I want to thank you again for all your hints. For those generally concerned with Quality Assurance in large TEI corpora, Deutsches Textarchiv's recently published article might be of interest. English abstract below, text is German, though. All the best
- for the DTA-Team -
Christian Thomas

Alexander Geyken, Susanne Haaf, Bryan Jurish, Matthias Schulz, Christian Thomas, Frank Wiegand: "TEI und Textkorpora: Fehlerklassifikation und Qualitätskontrolle vor, während und nach der Texterfassung im Deutschen Textarchiv." In: Jahrbuch für Computerphilologie, http://www.computerphilologie.de/jg09/geykenetal.html [retrieved 2012-08-09].

Abstract:
This paper deals with the issue of quality assurance in very large, XML/TEI-encoded full-text collections. The text corpus edited by the DFG-funded project Deutsches Textarchiv (henceforth: DTA), a large and still growing reference corpus of historical German, is a fine example of such a collection. The following remarks focus on text prepared in a Double-Keying-process, since the major part of the DTA-corpus is com-piled by applying this highly accurate method. An extensive and multi-tiered approach, which is currently applied by the DTA for the analysis and correction of errors in double-keyed text, is introduced. The process of quality assurance is pursued in a formative way in order to prevent as many errors as possible, as well as in a summative way in order to track errors which nevertheless may have occurred in the course of full-text digitization. To facilitate the latter, DTAQ, a web-based, collaborative tool for finding and commenting errors in the corpus, was developed. On the profound basis of practical experience in the past four years, the preliminaries and possible methods of conducting a widespread quality assurance are being discussed.

Posted 4 years ago Permalink

RSS feed for this topic

Reply

You must log in to post.