Paolo Monella Information and functionality in SDEs Digital Papyrology 3.0 Conference, Parma 2022

Paolo Monella Information and functionality in SDEs Convegno Digital Papyrology 3.0, Parma 2022

Details

Title	Information and functionality in scholarly digital editions
Conference	Digital Papyrology 3.0. Digital Encoding and Critical Edition of Greek Papyri: Perspectives and Progress (programme)
When	Monday May 30, 2022, 10.30 CET
Where	Online
Organization	University of Parma, Italy
Language	English

Dettagli

Titolo	Information and functionality in scholarly digital editions
Convegno	Digital Papyrology 3.0. Digital Encoding and Critical Edition of Greek Papyri: Perspectives and Progress (programma)
Quando	Lunedì 30 maggio 2022, 10.30 CET
Dove	Online
Organizzazione	Università di Parma
Lingua	Inglese

Materiali

Programma del convegno
Slide del mio contributo
- Versione breve (proiettata al convegno)
  - PDF
  - ODP
- Versione estesa
  - PDF
  - ODP
- NB: il file ODP della versione breve contiene anche le slide ‘saltate’ durente la presentazione orale, ma gli schemi nei due file ODP sono diversi (quelli della versione estesa contengono più elementi)

Materials

Conference programme
Slides of my talk
- Shorter version (used at the conference)
  - PDF
  - ODP
- Longer version
  - PDF
  - ODP
- Please note that the ODP of the shorter version also includes slides skipped during the talk, however, the graphs in the two ODP files are different (those in the longer version include more elements)

Abstract

Patrick Sahle wrote that a scholarly digital (not digitized) edition (SDE) “cannot be given in print without a significant loss of information or functionality”. Daniel Kiss argued that with enough pages, every quantity of information might be “given in print” too. This talk tackles the general question of the “digital added value” in a SDE – compared to print – from the perspective of the information/functionality hendyadis.

We can perform Hjelmslev’s “analyis” on raw character data and formally identify “entities” both on the syntagmatic axis (textual structures and relations) and on the paradigmatic axis (tokens, lemmas, stylistic features, named entities). In the Italian tradition of Digital Humanities this operation is commony called “formalizzazione” or “codifica”. In the English/international terminology, the key terms for it are “markup” or, more generically, “annotation”.

1. On this basis, the first possible added value of a SDE is that we can formalize and visualize complex relations within the text (structure, syntax, metatext), at its threshold (paratext) and beyond a monolithic/abstract concept of “Text” (versions, text/document). An important issue arises with the information itself: once the concept of text “explodes” and includes metatext, paratext, parallel versions and material philology, the quantity of information grows exponentially, in a “fractal” way (one paragraph, two versions, four glosses – one for each version – and so on). Is it worth to encode it digitally? Also, visualization is the only function commonly applied here, which exposes these digital philology applications to Kiss’s argument. Ultimately, the question is: how much does each area of textual studies (papyrology, epigraphy, classical/medieval/genetic philology etc.) want to invest on the digital recording of such “fractal” information, with the sole purpose of visualizing it? It depends on how much each area is focussed on the plural nature of the text (versions) and on the documents bearing the texts.

2. The second possible added value lies in the semiotic concept of “isotopy”, defined by Greimas as “un ensemble redondant de catégories sémantiques”. If we formally identify entities on the paradigmatic axis (tokens, lemmas, stylistic features, named entities/Linked Open Data), algorithms can identify isotopies throughout a text – that is, they can track the recurrence of entities of the same class, such as lemmas of the same lexical field, similar linguistic and stylistic features, place names etc. The question now becomes: what do we do with those isotopies?

2.1 We can apply simple algorithms to create a linear visualization of the isotopy, i.e. of the recurrence of elements of the same class (highlighting, search, indices, maps). A possible objection here regards both information and functionality: print editions might theoretically record/visualize trivial information (such as the linguistic annotation of a morphologic category) through formatting, but they do not do so because the mere visualization of such basic information would not bring any strong scientific advantage. If, instead, the information is more meaningful (e.g. people, names, concepts), also print indices in a book may track it throughout the text. Which suggests that mere linear visualization of isotopies does not necessarily provide a compelling added value of SDEs over print editions.

2.2 In addition, we can apply more complex algorithms to further process an isotopy (the recurrence of some elements) and produce secondary data with non-linear outputs. Examples of such algorithms include topic modelling, stylometry, word vectors or, if we use entities/Linked Open Data entities as input, social network analysis (for people) and network analysis (for places and other concepts). Outputs include tables, graphs and other forms of complex data visualization. In this case, the added value is apparent both in terms of information (the data produced is new, meaningful, and it is not encoded manually, but produced by software, thus removing the issue of limited time/human resources) and in terms of functionality (data is produced dynamically based on analysis algorithms and their adjustable parameters).

3. A third category of fairly apparent added values regards the social dimension of SDEs: the very availability of large plain text corpora (with the connected basic functions of browse and string matching search); social editing (based on shared research infrastructures such as papyri.info); Open Science (resource interoperability based on APIs, data reuse based on Open Data repositories).

In conclusion, compelling arguments for the added value of SDEs certainly come from the functionalities in the third category above (3. social dimension) and from the information and functionalities of category 2.2 (complex algorithms that process isotopies and produce a non-linear output). The advantage produced by category 2.1 (simpler algorithms that produce a linear visualization of isotopies) is less compelling. This suggests that the development of computational text analysis methods (2.2) is a key challenge for digital philology. As for category 1 (visualization of textual relations), only those areas of textual studies which are more deeply concerned with “plural” texts and with the text/document relation currently find it convenient to invest vast resources (in terms of time, training and funding) to encode that kind of potentially “fractal” information.