What's the best practice for choosing filenames for large projects. Is there an archival standard that one should use?
What's the best way to name files for large projects??
(7 posts) (6 voices)-
Posted 8 years ago Permalink
-
Replying to @Julie Meloni's post:
In this case, digitizing large runs of various magazines. The files will be: .tiff, .jpg, .xml (tei, mods, mets).
Basically, I need to find a naming standard that conforms to good archival naming practices. Is there a "best practice" for naming projects and item numbers in the same filename? (so, let's say "Victorian magazine" has 250 individual pages, and thus 250 individual .tiff images, etc.) These images will be combined into one PDF document and then the xml files will be created.Posted 8 years ago Permalink -
I don't know if there's a best practice available, but I'd do it this way:
MagazineName.Issue.Volume.Year.Page.extension
You might want to leave out the dots between each item, though. I mostly put them there for readability. This method ensures unique filenames that are still meaningful to a human reading them.
Posted 8 years ago Permalink -
Although I'd recommend not relying too heavily on file-level naming for organization, you might want to consider what would be the most likely scenarios for quickly sorting and manipulating files (as @mwidner suggests):
- by date: year-magazine-volume-issue-page.extension
- by publication: magazine-volume-issue-page-year
In some ways I'd say that file naming is less important than an external organizational structure, like having an SQL or XML structure that uniquely and reliably identifies each file and provides additional metadata – but maybe that's heavier than you need for now.
Posted 8 years ago Permalink -
I'd really recommend *against* trying to come up with filenames that have lots of semantic content for human readers. See this section of Greg Crane's chapter on document management and filenaming, in _Electronic Textual Editing_: http://bit.ly/a02Sgk See also Daniel Pitti's chapter on Project Management in The Companion to Digital Humanities, here: http://bit.ly/bskILq -- particularly this paragraph:
On the other hand, it is quite common to create complex, semantically overburdened file names when using file names and directories to manage files. Such file names will typically attempt, in abbreviated form, to designate two or more of the following: source repository, identity of the object represented in a descriptive word, creator (of the original, the digital file, or both), version, date, and digital format. If the naming rules are not carefully documented, such faceted naming schemes generally collapse quite quickly, and even when well documented, are rarely effective as the number of files increases. Such information should be recorded in descriptive and administrative data linked to the address of the storage location of the file. A complete address will include the address of the server, the directory, and the file name. The name and structure of the directory and file name should be simple, and easily extensible as a collection grows.
Posted 8 years ago Permalink -
For the Every Week magazine project we are working on at the CDRH, we have the following naming convention:
ew.issue.19150503.p001.tif
ew.issue.19150503.p001.jpg
ew.issue.19150503.xmlSo, for the tifs and jpgs: a short prefix for the magazine name; the word "issue"; yyyy-mm-dd; page number; extension.
For the xml: the same formula minus the page number
Posted 8 years ago Permalink
Reply
You must log in to post.