Hello,
I'm working on an ongoing research project at my school where I will be encoding in TEI. I was just curious if anyone had a list of standard TEI tags, and specifically tags for SmallCaps, Hyphens, and M-lines.
Thanks!
Hello,
I'm working on an ongoing research project at my school where I will be encoding in TEI. I was just curious if anyone had a list of standard TEI tags, and specifically tags for SmallCaps, Hyphens, and M-lines.
Thanks!
As for your specific questions, I would use a <hi rend="smallcaps"> and an — (ampersand pound 8212 semicolon) for the mdash. I would leave the hyphen as a hyphen in most cases.
For questions like those, I would start somewhere like this guide at Brown: http://www.wwp.brown.edu/encoding/guide/index.html
or TEI by example: http://tbe.kantl.be/TBE/
also, some projects expose their TEI, like the Willa Cather Archive, see for example: http://cather.unl.edu/cat.0005.xml
Often I include an file of HTML entities in my TEI that allows me to use the HTML version in the XML. See https://gist.github.com/672959 for the file, and include it in your DOCTYPE
<!DOCTYPE TEI.2 SYSTEM "http://text.lib.virginia.edu/dtd/tei/tei-p4/tei2.dtd" [
<!ENTITY % XMLSpecial PUBLIC "-//W3C//ENTITIES Special for XML//EN" "characters.ent">
%XMLSpecial;
]>
These are just for special characters; if you need to render text in a certain manner, you will use any number of rend styles (just be aware that you need to define how to handle that in your XSLT).
I always find this page to be a great resource:
http://www.tei-c.org/release/doc/tei-p4-doc/html/REFTAG.html
Feel free also to use what you can from the TEI handout Chris Forster and I created for our undergraduate digital lab:
http://uvatango.wordpress.com/class-materials/tei-handout-poetry-edition/
please please, dont use http://www.tei-c.org/release/doc/tei-p4-doc/html/REFTAG.html -
that is for the old, deprecated, TEI P4. Use
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ and points south.
I find it slightly worrying that people are still recommending TEI P4 resources. TEI P5 is a much more up-to-date release of the TEI Guidelines! TEI P5 was released in 2007 and has had 6-monthly maintenance and feature releases since then.
Please use the version at http://www.tei-c.org/Guidelines/P5/ P4 whilst still supported really is deprecated for all sorts of reasons too lengthy to go into here.
I answer to the original poster (Ian) I would suggest the same thing that Laura suggests. Textual highlighting of any form (such as change to small caps, underlining, bold, or whatever) is just a form of the 'hi' element http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-hi.html unless they have specific semantics that you want to denote such as 'emph' or 'foreign' or 'term'. The real question to ask is why you are encoding small caps. Is it just because they are in small caps or is it because they *are* something else that is important. If feasible always use the element for intellectual meaning of the text rather than its mere presentation. Other characters you mention are just Unicode characters, you don't need special character entities (also deprecated) or whatever to deal with them. In the very unlikely case where you are dealing with characters that are not in Unicode, the TEI provides a method (using the 'g' element) for describing characters that you are unable to display.
@Wayne: Why do you feel the need for using character entity references rather than just using Unicode directly? This TEI P4 method of doing things is dated and doesn't take advantage of all the new developments in the TEI, XML, and related technologies. Not that you can't use TEI P4, do whatever you want, but it is a bit like choosing to use Word version 2 (rather than say version 6 or office 2007 or later or something) or Windows 98.
@mjockers: The TEI P5 page would be better: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/REF-ELEMENTS.html, people really should have stopped using TEI P4 in 2007.
elotroalex: That is a good page. The TEI Guidelines themselves also contain a gentle introduction to XML that is a good introduction.
The right place to ask this question, if you are really using the TEI is on the TEI-L mailing list. http://listserv.brown.edu/archives/cgi-bin/wa?SUBED1=tei-l&A=1/.
Hope this helps,
-James Cummings
TEI@Oxford
tei.oucs.ox.ac.uk
If you are starting with TEI, please don't use the P4 several people above have referred to.
The current Guidelines are TEI P5, at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/.
In answer to the question, I think I would ask why you are recording small caps? its likely the
phrase in small caps has some semantic meaning which you should maybe capture.
James and Sebastian, I suspect people are still referring to the P4 in some cases because institutions (often libraries) standardized around it and -- being large, slow-moving beasts -- some of them have yet to make the conversion. But you're absolutely right to stress that new projects should use P5!
And thank you to James for pointing to the TEI-L mailing list! We set up DH Answers, in part, to help newer members of our community discover resources like that -- but also to be a place where people felt comfortable asking beginner's questions. I know TEI-L is a really welcoming list, but any expert community like that can seem a little daunting to a newbie and (speaking as a DH Answers moderator) we see some value in keeping this space open to TEI questions, too.
Replying to @Bethany Nowviskie's post:
Hi Bethany,
Yes, I understand that large slow moving institutions are still using P4, and certainly agree that any new projects should not be using it. Using TEI P5 buys you not only much more up-to-date aspects (like manuscript description, facsimile, etc.) but also richer XML-related abilities such as better ways to customise and document their customisation of the TEI (TEI ODD, Roma, etc.) and the ability to embed elements from other namespaces (e.g. SVG, MathML, MEI) into their TEI and still be able to document and validate this.
I certainly don't want to take away from DHAnswers, having TEI questions here is of course perfectly reasonable as it is a large player in the Digital Humanities. I'd encourage more newbies to post to TEI-L as well though... it really is much friendlier than people seem to think. Partly this is because of the sometimes quite technical nature of the discussion, but I for one would welcome more basic questions there. (Maybe that's because I can answer them ;-)) I'd view questions where people wanted to ask what markup vocabulary or technology they should be using for a specific project are perfect for DH Answers. If it turns out that TEI is the appropriate answer in that case, then directing them to the TEI-L mailing list and Guidelines seems a good thing to me. Likewise if people post of the TEI-L mailing list saying "Well this isn't specific to the TEI but what tools should I use to do X?" I think pointing them to DH Answers seems like a good idea. Similarly with other technologies and formats. Asking here if XSLT is the right solution is a good idea, but then going to the xsl-list@mulberrytech.com to actually ask XSLT questions is probably better.
Maybe DH Answers should have a sticky post somewhere of a list of useful DH resources? Or is that duplicating effort from elsewhere?
Hey, James -- good stuff! To keep this discussion on the TEI topic, I've instead responded to you over in our thread on suggestions for improving the DH Answers site!
Thanks for all the help. I'm very new to TEI so I'm sorry if this doesn't make much sense, but to answer your question. I'm doing a research project in college and the person in charge of me suggested I tag smallcaps because he planned on comparing two similar texts in real time. The texts are all slightly modified from the original, so smallcaps would matter.
Replying to @IanRobertson's post:
Hey Ian. @jamesc is far more authoritative on these matters than I am, but I think I can elaborate on his point to clarify what you need. In short: the <hi> tag can take care of your smallcaps issue. For instance, if you have a poem with the first word in small caps it might look like this:
<lg type="poem"> <l><hi rend="smallcaps">Some</hi> keep the Sabbath going to Church—</l> <l>I keep it, staying at Home—</l> [...]
The point in determining what function the small caps serve is just to mark it up in the most appropriate way. If the small caps were section titles, then it might make sense to mark up not the typography (or maybe not only the typography) but to add something semantic, like a <div> or a <head> or whatever. I think that was @jamesc's point in noting "the real question to ask is why you are encoding small caps." I think in this case, indeed, it sounds like you're marking up them as small caps just because they're small caps. In which case <hi rend="smallcaps">...</hi> seems like a good solution.
Hope that clarifies.
Yup, that is exactly what I meant. :-) Chris is right, if you want to mark them up just because they are in small-caps then what you are (semantically) noting is that there is some highlighting of some sort there and that is it. So <hi rend="sc"> (or similar) is definitely your friend in that case. http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-hi.html
The 'comparing two texts in real time' seems interesting, and if you want help with that let us know!
-James
Hey all,
Thanks for helping Ian with this issue--he's been helping me encode texts for my Celestial Railroad project. I'm still wondering about the best practice for characters like mdashes. Following these various links, we've found every recommendation from using the html and identifying the entity in the header to using the actual numerical designator.
I do think we'll use <hi rend> for small caps. Thoughts about mdashes &c.?
--Ryan
You must log in to post.