Paolo Monella Post-doc scholarship in Digital Humanities Accademia dei Lincei, Rome 2012

EpiDoc Training Workshop: notes

The running (unedited, not-exhaustive) notes I am taking at the workshop: Reggio Calabria, June 4-7 2012.

Beware: I am not aiming to write everything down, nor even the main points. I'm only taking notes on what strikes me as new or sounds to me intresting for further reflection.

Monday, June 4, morning

Lou Burnard, lecture: What is text encoding? What is XML?

9.30 AM

Are a manuscript, a print edition and a digital edition of a text

the same thing?
different versions of the same thing?
different things?

I (Lou Bernard) do not have an answer to this question.

History of TEI: Antonio Zampolli helped providing funding to TEI from EU in its initial phases.

We don't only want digital objects that are just "surrogate representations of original documents". What's "document" here? Example made by Lou Burnard: inscriptions.

A text is not a document [well said, pal!]

We digitize documents, we encode texts.

A document is something that exists in the world, which we can digitize.
A text is an abstraction, created by or for a community of readers, which we can encode.

Digitisation vs. encoding:

Digitisation: a copy of the appearance of the document.
Encoding: a reading of the text.

[But: is this awareness always held strong by those who make up the TEI guidelines and especially by those who use TEI?]

The TEI exists because there's a broad consensus that there may be a shared encyclopedia of what can be encoded about a text.

My own conversation with Lou Burnard. He says that it's not possible to transofrm a TEI file representing a "document" into a TEI file representing a "text" automatically through XSLT because there's too much (human) interpretation involved. For text/document distinction in TEI there's a new development (in the last two years): the sourceDoc TEI element. Elena Pierrazzo is working on it. It's mostly useful for genetic editions (e. g. the Faus manuscripts).

11.45 AM Pratical: Creating a "born digital" XML document.

Lou Burnard's lecture: What is the TEI?

TEI relevance: interchange of data between humans and (increasingly) machines.

"Frequently answered questions" - common technical questions for different application areas.

Lou Burnard's lecture: The Scope of the TEI. A quick overview of the TEI landscape

The TEI Guidelines provide a lexicon ( 521 elements, grouped into 146 classes, and 22 modules) and a grammar (Relax NG)

17.10 PM: working with OxGarage Conversion to convert docx files into TEI P5 XML files.

Second day: June 5, 2012

9.30 AM

Lou Burnard's lecture: Textual Editing with the TEI Or, Documentary Editing with the TEI Or, What is genetic criticism and why are they saying such strange things about it?

N. B.: the paragraphs without [square brackets] are my summarization of Lou Bournard's talk; the parts in [square brackets] represent my own comments.

The TEI originally was thought for the text, not the document. It's a Text Encoding Initiative.

"As facilitator of multiple theories, the TEI tries to avoid a theoretical stance, but rarely succeeds ..."

No theoretical stance about textual scholarship (one text? many texts? an 'original' text? etc.)

The [XML/TEI lem/rdg] format is not sufficient to reliably regenerate the original sources, nor (probably) to represent efficiently the output from an automatic collation programn.

Better models for text variation: See detailed comments from the TEI MS SIG review of this chapter.

In particular see Schmidt's model of ‘multiversion documents’: Not unlike Sperberg-McQueen's Rhine-delta model from 1989, this probably provides a better data structure for representing the results of automatic collation.

In fact it seems that people don't care that much about pre-existing. collations. They want to make their own, sharing outputs from collation engines such as Juxta.

[The rest of this presentation is really interesting; just check the original presentation]. [Angelo Del Grosso says: under the mathematical viewpoint, this is a "markup stream"].

[My question: is TEI interested in the Schmidt model, although it's conceived as an alternative to the TEI/XML data model?]

Can you transcript primary sources using TEI? Until a year ago or so, it wasn't because the T in TEI stands for text:

the <text> element contains a structured reading of a document's intellectual content ... its ‘text’

But now it is possible because we have now a new element called sourceDesc, that says "I'm transcribing only one document".

[If TEI is meant to represent a text, why does it have <lb> elements? And: is the "TEI transcription of primary sources" markup also meant to represent a text? And: why do we put a manuscript description in the TEI header? If it's only a philological annotation for where the text comes from, which is in any case useful to know, shouldn't the source description be enough?]

The <choice> element.

Peter Robinson says that medieval MSS encoding and modern authors' handwritings pose the same problems. Elena Pierazzo says they pose different problems:

the MS is written to be read by an audience
the authorial handwriting is not written

TEI genetic editions working group: http://wiki.tei-c.org/index.php/Genetic_Editions

A completely new TEI structure: sourceDoc element. Now a TEI document can consist of 4 parts:

header
facsimile
sourceDoc
text

The sourceDoc element

a sibling of <text> <teiHeader> and <facsimile
represents the physical structure of a document, in terms of written <surface>s, and <zone>s of writing.
a <line> element is added to represent topographic lines

[Lou Burnard then goes into some description and examples of the new markup provided by the genetic edition working group: see slides]

[I'm omitting some lectures and practice excercises here]

Lecture: Marion Lamé, EpiDoc

[The <rdg> tag is used with slightly different meaning in MS transcription and in EpiDoc. Does the EpiDoc usage define a more specific definition of what a <rdg> is?]

[Problem: the rdg element should refer to the text, not to a document (it belongs to the Apparatus criticus TEI module). But EpiDoc should encode a document (an epigraph).]

Many TEI editorial decisions are not explicited w/ tags. E. g. spaces b/w words are editorial decisions.

Marion Lamé shows a slide by Gabriel Bodard saying: "TEI app was designed for Lachmannian appcrit". [This is a limit, right?]

[TEI app crit is meant to generate, through XSLT, HTML w/ Lachmannin app crit. Otherwise, why inserting a head element with text "Apparatus criticus" within the app element? But we should think and model beyond this task]

I thought that EpiDoc was pure document (not text) encoding. But it's not: it aims to being text encoding, and poses the same (text/document) conceptual issues of the MS transcription module (and the rest of the TEI).