In what follows I expand on the digital model to represent primary sources (such as manuscripts) prototyped by Prof. Tito Orlandi in his experimental digital edition of Machiavelli's De principatibus. These are my working notes, therefore work in progress.
See a 500-words abstract on this subject.
Any comments? Check my contact page.
The process of creating a scholarly edition of a literary work and its textual tradition is based upon a comparison (collatio) of the representations of the text in primary sources.
In order to do so, a digital scholarly edition must rely on digital representations (commonly called "digital transcriptions", but I'd rather say "digital models") of primary sources, formalised in a way that allows the computer to compare such representations with each others.
I will draw most of my examples from the textual transmission of Latin texts through medieval manuscripts and early print editions.
I will list some discrepancies between different written encoding systems Latin.
Ancient Romans used a written alphabet (graphical encoding system) based on capitalised letters, so no distinction between smaller case and lower case. They did not use punctuation to divide sentence, and in some cases did not use spaces to distinguish words. They did not sense a phonetic distinction between a /u/ and a /v/ sound, so it may be said that our /u/ and /v/ phonemes are not mappable to Latin. Therefore, they did not have a /u/ and a /v/ grapheme, but only a [V] grapheme that corresponded to the Latin sound probably corresponding to IPA /u/ and /w/. We may say that that alphabet (as a part of their graphical encoding system) did not inlcude a grapheme [v].
A writing convention of the Modern Age distinguished between a [i] grapheme for vocal Latin /i/ (as in "iter") and a [j] grapheme for semivocal latin /j/ (as in "jus"). That encoding system therefore included a [j] extra grapheme (as opposed to the one used by ancient Romans or by most contemporary print conventions).
Among contemporary Latin scholarly print editions, some are based on an alphabet with separate [u] and [v] graphemes ("votum"), some are based on an alphabet with only a [u] sign ("uotum").
No need to say that use of punctuation in different encoding systems varies immensely, from the ancient times when it didn't exist, through different medieval and modern usages.
The usage of horizontal spaces to separate words is still problematic for Latin. One can mention a number of cases:
[To be continued...]
The TEI module 11 Representation of Primary Sources seems to work for modern sources, such as contemporary writers' autographs or printed texts - for which it has been primarily devised. It assumes that the encoding system (starting from the "alphabet") upon which each primary souces is built is confrontable with that of other sources, and with the one that is used for our own edition.
[To be continued...]
(The wording must be refined)
We must create digital representations of the written representations of the text in the primary sources (often called "digital transcriptions of primary sources").
For each written sources, we must create (and declare explicitly) a digital model mirroring its (peculiar) written encoding system.
Given that the written encoding ystems of different textual sources do not overlap, also our digital models of those systems will not overlap.
This makes the digital representations of the written representations of the text not directly comparable with each other by the computer.
In order to make those digital repesentations comparable, we must add a level of representation, and map
The former representation of that unit will be computationally comparable (traditional "collatio") to the corresponding representations of that unit in the digital representations of the other witnesses.
The passage from A to B above is another open issue:
&dns_tilde;
, the digital representation of the written brachigrapy "dñs" for "dominus" -- to B: "dominus" (that is a string of ASCII characters representing "d", "o", "m" etc. in our digital encoding system)&dns_tilde;
) to C (the linguistic unit of "Latin" -- i. e. his model of the Latin language -- that he knows as nominative singular form of lemma "dominus, -i", masculine of the 2nd declension), and then from C to B (ASCII "d", "o", "m" etc.)?
If this is the case, the TEI <abbrev>
code substantially fails in that it does not imply any distinction between
<abbrev>
elementsand
The use of the same ASCII (or Unicode) symbols for almost everything has the effect of hiding the underlying teoretical issue, and coding solutions as the <abbrev>
element fail in keeping the two levels dinstinct.
To be continued...