Paolo Monella - Fabio Cusimano Linking Text and image: TEI XML and IIIF
1. Details
1. Dettagli
On Wednesday July 3, 2019 Paolo Monella and Fabio Cusimano led a workshop entitled Linking Text and image: TEI XML and IIIF in the framework of the Summer School ReIResources: Sharing Resources in a Networked Digital Ecosystem (Bologna, Italy July 3-5) organized by ReIReS (Research Infrastructure on Religious Studies) and Fscire (Fondazione per le scienze religiose) Giovanni XXIII in partnership with AIUCD (Associazione per l'Informatica Umanistica e la Cultura Digitale) and Veneranda Biblioteca Ambrosiana.
Paolo Monella led part 1 of the workshop (on TEI XML); F. Cusimano led part 2 (on IIIF).
Mercoledì 3 luglio 2019 Paolo Monella e Fabio Cusimano hanno condotto un workshop intitolato Linking Text and image: TEI XML and IIIF nel contesto della summer school ReIResources: Sharing Resources in a Networked Digital Ecosystem (Bologna, 3-5 luglio) organizzata da ReIReS (Research Infrastructure on Religious Studies) e Fscire (Fondazione per le scienze religiose) Giovanni XXIII in partnership con AIUCD (Associazione per l'Informatica Umanistica e la Cultura Digitale) e Veneranda Biblioteca Ambrosiana.
Paolo Monella ha guidato la parte 1 del workshop (su TEI XML); F. Cusimano, la parte 2 (su IIIF).
2. Abstract
Part 1 (TEI XML)
Parte 1 (TEI XML)
In the first part of the workshop, led by Paolo Monella and centered on digital textual modelling and TEI XML, students will create a digital (formal, machine-actionable) model of a portion of a text from a medieval manuscript, both gaining hands-on experience and reflecting on the methodological and theoretical foundations and issues of textual modelling.
They will follow a inductive path, moving from the elementary structures of the computer (a sequence of binary states, "on/off", "yes/no", often represented by "0" and "1") to binary and decimal numbers and charsets (ASCII and Unicode).
At this point, the hands-on experience will start: students will create their own textual markup language based on symbols of their choice and will be asked to reflect on the theoretical and methodological issues arising from inline markup.
They will then be introduced to the SGML/XML syntax and to the TEI XML vocabulary and will encode a brief textual portion taken from a medieval manuscript, based on its digital images and using the TEI module for the transcription of primary sources.
The students will then be presented, and will practice, two alternative strategies for combining TEI XML, the current standard for scholarly text encoding, with IIIF, the rising standard for online image metadata and annotation:
- the first approach consists in linking to the digital images of the manuscript from within the
TEI XML source, for example with the TEI attribute
@facs
; - with the second approach, the whole TEI XML transcription is included in the IIIF metadata as an "Annotation".
This will constitute a bridge with the second part of the workshop, led by Fabio Cusimano, focussed on IIIF.
Nella prima parte del workshop, condotta da Paolo Monella e centrata sulla modellizzazione digitale del testo e su TEI XML, gli studenti creeranno un modello (formale, machine-actionable) di una porzione di testo tratto da un manoscritto medievale, da un lato ottenendo esperienza diretta e dall'altro riflettendo sulle basi teorico-metodologiche e sulle questioni aperte della modellizzazione del testo.
I corsisti saranno accompagnati in un percorso induttivo e laboratoriale che partirà dalle strutture elementari del funzionamento del computer (una sequenza di stati binari, "aceso/spento", "sì/no", spesso rappresentati con "0" ed "1"), fino ai numeri binari, a quelli decimali, ai CharSet (tabelle di caratteri come ASCII o Unicode).
A questo punto, inizierà l'esperienza diretta: gli studenti creeranno un loro linguaggio di markup basato su simboli scelti da loro, e saranno portati a riflettere sulle questioni teoriche e metodologiche legate all'inline markup.
Saranno dunque introdotte la sintassi SGML/XML e il vocabolario TEI XML. Gli studenti codificheranno una breve porzione testuale tratta da un manoscritto medievale, partendo dalle sue riproduzioni digitali e usando il modulo TEI per la trascrizione delle fonti testuali.
Infine, si presenteranno agli studenti due strategie alternative di integrazione tra TEI XML (lo standard attuale per la codifica testuale nel mondo della ricerca umanistica) e IIIF (lo standard emergente per la metadatazione e l'annotazione di immagini nel Web):
- il primo approccio consiste nel creare nel codice TEI XML (ad esempio tramite l'attributo
TEI
@facs
) link che puntino alle immagini digitali del manoscritto; - col secondo approccio, l'intera trascrizione TEI XML è incusa all'interno dei metadati IIIF come "Annotation".
Gli studenti praticheranno entrambe le strategie. Ciò costituirà un ponte verso la seconda parte del workshop, condotta da Fabio Cusimano e centrata su IIIF.
Part 2 (IIIF)
Parte 2 (IIIF)
The second part of the workshop will be focused on digitization good practices, digital library design and IIIF (International Image Interoperability Framework).
Fabio Cusimano will introduce these topics as tiles of a complex mosaic, starting from a real-life case study: the on-going digitization experience at the Veneranda Biblioteca Ambrosiana in Milan.
Then, the students will be presented the IIIF Web-based approach as a way to literally unlock digital collections thanks to LD (Linked Data). From the concept of the capsa librarum, or of the bibliotheca – as the etymology of the word itself suggests – to the open and freely accessible library in the digital dimension.
3. Workshop plan
3. Programma del workshop
Trainer | Module | From | To | Topic/activity |
---|---|---|---|---|
Monella | Digital textual modelling | 11.00 | 11.20 | Concepts of model, formal model and digital model |
11.20 | 11.40 | Let's build a digital textual model: binary numbers, digital numbers, charsets (ASCII and Unicode), textual markup | ||
Monella | TEI XML | 11.40 | 11.50 | TEI (Text Encoding Initiative) XML |
11.50 | 13.00 | Encoding a portion of a manuscript in TEI XML based on the manuscript images | ||
Lunch break | 13.00 | 15.00 | ||
Cusimano | The Veneranda Biblioteca Ambrosiana | 15.00 | 15.15 | The Veneranda Biblioteca Ambrosiana and its new digital infrastructure |
Cusimano | Designing a new digital library devoted to manuscripts | 15.15 | 15.30 | Facing the preservation risks |
15.30 | 16.00 | Some good practices in digitization | ||
Cusimano | A new approach: IIIF - International Image Interoperability Framework | 16.00 | 16.20 | IIIF Core APIs: Image API & Presentation API |
16.20 | 16.40 | IIIF Canvas and the .json Manifest | ||
16.40 | 17.00 | IIIF & the image viewer Mirador: playing with images | ||
17.00 | 17.20 | The image viewer Mirador and the UI as a research tool: annotating images | ||
Monella | Linking TEI and IIIF (International Image Interoperability Framework). Two strategies | 17.20 | 17.50 |
TEI 2 IIIF:
Linking from within the TEI XML transcription source code
(attribute @facs )
to IIIF
|
Monella | Linking TEI and IIIF (International Image Interoperability Framework). Two strategies | 17.50 | 18.20 | IIIF 2 TEI: Including the TEI XML transcription in the IIIF JSON metadata as "Annotation" |
Cusimano - Monella | Post-workshop dissemination | 18.20 | 18.30 | Brainstorming: how do you think to train your colleagues on what you have learned during this workshop? |
4. Materials
4. Materiali
4.1 Framasoft shared pad
4.1 Pad condiviso Framasoft
4.2 From digital textual modelling to TEI XML
4.2 Dalla modellizzazione testuale digitale a TEI XML
- Example of TEI XML encoding of lines 1-5 of folio 13v of manuscript Ambr. D 23 sup. in TEI XML:
- Let's encode lines 3-14 of folio 13r of manuscript
Ambr. D 23 sup. in TEI XML:
- Template file to start from
- Digital facsimile of the page
- Plain text transcription
- Complete TEI XML transcription
- Codifichiamo le righe 3-14 del folio 13r del manoscritto Ambr. D 23 sup. in TEI XML
- File template da cui partire
- Riproduzione digitale della pagina
- Trascrizione
4.3 Linking TEI and IIIF: TEI 2 IIIF
4.3 Collegare TEI e IIIF: TEI 2 IIIF
- Linking (from within TEI) to a static image:
<pb n="13r" facs="13r.jpg"/>
- OxGarage:
converting TEI XML to HTML for visualization.
Instructions:
- Left: Convert from → Documents → TEI P5 XML Document
- Right: Convert to → xHTML
- Left: Select file to convert
→ Button Browse/Sfoglia →
select and upload your TEI XML file
(
mytrascription.xml
) - Right: Upload images → → Button Browse/Sfoglia → select and upload the image with the manuscript page facsimile
- Bottom, center: click on button Convert
- In a few seconds, a download dialog window appears → click Save/Salva
to save the HTML file
mytranscription.html
- Open the downloaded HTML file (double click; your default browser will open it)
- Linking (from within TEI) to a whole IIIF JSON manifest:
<pb n="13r" facs="http://213.21.172.53/manifests/public/0b002711800e7d6d.json"/>
- Linking (from within TEI) to a specific canvas (folio) within the IIIF JSON manifest:
<pb n="13r" facs="http://213.21.172.53/manifests/public/0b002711800e7d6d.json#/sequences/0/canvases/35"/>
4.4 Linking TEI and IIIF: IIIF 2 TEI
4.4 Collegare TEI e IIIF: IIIF 2 TEI
4.4.1 Workshop activities
- Mirador:
- Visualize an annotation
- Create an annotation
- Hacking the IIIF JSON code
4.4.2 Hacking the IIIF JSON code: visualize (read) an annotation
-
IIIF link to be visualized with Mirador
(MS Ambr. D 23 sup)
- Folio 13v (IIIF JSON Annotation already published):
scroll right to
036_D23sup_c.13v
- Folio 13r (the page we transcribed):
scroll right to
035_D23sup_c.13r
- Folio 13v (IIIF JSON Annotation already published):
scroll right to
-
IIIF manifest (JSON file).
- path to annotationlist on page 13v:
[Collapse all]
sequences / 0 / canvases / 35 / otherContent / 0 / @id
(or find / control-F "otherContent
")
- path to annotationlist on page 13v:
[Collapse all]
-
IIIF annotationList on page 13v (JSON file).
- Path to actual transcription:
[Collapse all]
resources / 0 / resource
- Path to actual transcription:
[Collapse all]
4.4.3 Annotation types
Annotation including plain text (current annotation in our annotationList):
"resource":{ "@id": "http://example.org/iiif/book1/res/comment1.html" "@type": "cnt:ContentAsText" "format": "text/plain" "chars": "Cuius describtio per prouincias et gentes haec est: Lybia, Cyrinaica et<br/><br/>Pentapolis post Aegyptum in parte Affricae prima est. Haec incipit a<br/><br/>ciuitate Parethonio et montibus Catabathmon, inde secundo mari usque ad<br/><br/>aras Philinorum extenditur." }
Annotation linking to an HTML page:
"resource":{ "@id": "http://www1.unipa.it/paolo.monella/reires2019/code/d23sup13r/transcription.html", "@type": "dctypes:Text", "format": "text/html" }
Annotation linking to a (TEI) XML file:
"resource": { "@id": "http://www1.unipa.it/paolo.monella/reires2019/code/d23sup13r/transcription.xml", "@type": "dctypes:Text", "format": "application/tei+xml" }
4.4.4 Hacking the IIIF JSON code: create (edit) an annotation
- Open the annotationList
- If necessary, control-U or right click / View source code
- Select all the JSON
annotationList
source code (control-A or right click / Select all) and copy it - Open the online JSON-editor
- Paste the JSON
annotationList
source code into the left window of the JSON-editor - Click on Format JSON
- At the top of the right window, select View
- Edit the code in the left window. After each edit, click on Format JSON to get a clearer view of the code in the right window
5. Final project
5. Progetto finale
- Create a new folder
project
- Put your TEI XML transcription of folio 13r in that folder. If the OxGarage conversion to HTML (see below) does not work, download the complete TEI XML transcription and use it instead
- Download the Digital facsimile of folio 13r to the same folder
- Convert your TEI XML file to HTML with OxGarage.
Instructions:
- Left: Convert from → Documents → TEI P5 XML Document
- Right: Convert to → xHTML
- Left: Select file to convert
→ Button Browse/Sfoglia →
select and upload your TEI XML file
(
mytrascription.xml
) - Right: Upload images → → Button Browse/Sfoglia → select and upload the image with the manuscript page facsimile
- Bottom, center: click on button Convert
- In a few seconds, a download dialog window appears → click Save/Salva
to save the HTML file
mytranscription.html
- Open the downloaded HTML file (double click; your default browser will open it)
- Save the downloaded HTML file to the
project
folder - Open the HTML file with Sublime and edit it as you want
(examples: add a new
<div>
with the translation or any note; change the title; add the names of the curators or a link to the ReIReS website) - Create a
foglio.css
file in the same folder - Connect the HTML file with the
foglio.css
file by inserting<link rel="stylesheet" href="../indice/stile/foglio.css" type="text/css" />
in the<head>
of the HTML file - Edit the
foglio.css
as you want to change the HTML page style
6. Useful links
6. Link utili
6.1 XML editors: making writing XML code easier
- Oxygen XML Editor: professional, standard in TEI community, paid (with trial period), complete until HTML transformation professionale, standard nella comunità TEI, a pagamento (con periodo di prova), completo fino alla trasformazione in HTML
- XML Copy Editor: open source, free, installation open source, gratuito, da installare
- Online XML Editor: for this workshop only, no installation solo per questo workshop, nessuna installazione
6.2 Tools to process TEI XML for visualization
- Oxygen XML Editor (see above)
- EVT (Edition Visualization Technology)
- OxGarage: convert TEI XML to HTML for visualization
- TEI Critical Apparatus Toolbox
- Display parallel versions: synoptic visualization of textual variants visualizzazione sinottica delle varianti testuali
- Check your encoding: simple visualization visualizzazione
6.3 Guidelines, specifications and help
- TEI
- IIIF official website
- Drop us an email: fcusimano at ambrosiana.it and
6.4 JSON and IIIF resources
- Online JSON editors
- Hacking Mirador
- Darth Crimson Mirador viewer
-
Oxford Bodleian Libraries IIIF manifest editor:
click on Open manifest and paste
http://213.21.172.53/manifests/public/0b002711800e7d6d.json
-
Annotation extractor:
click on Load manifest and paste
http://213.21.172.53/manifests/public/0b002711800e7d6d.json
7. Suggested readings
7. Suggerimenti bibliografici
7.1 DH and TEI XML
7.2 DH e TEI XML
- Orlandi, T. (2010), Informatica testuale. Teoria e prassi, Laterza, Roma.
- Pierazzo, E. (2015), Digital Scholarly Editing: Theories, Models and Methods, Ashgate, Farnham (Surrey, UK) and Burlington (VT, USA), https://hal.archives-ouvertes.fr/hal-01182162/document
- Stella, F. (2018), Testi letterari e analisi digitale, Carocci.
7.2 IIIF
- F. Cusimano, Due esempi di “buone pratiche” nell’uso dei metadati XML. Un’efficace “disseminazione” dei contenuti digitalizzati, C.R.E.L.E.B.- Università Cattolica, Milano, Edizioni CUSL, Milano 2014 (Minima Bibliographica, 19), ISBN: 978-88-8132-7058 https://centridiricerca.unicatt.it/creleb-Cusimano.pdf.
- L. Magnuson, Store and display high resolution images with the International Image Interoperability Framework (IIIF), in «ACRL TechConnect Blog», February 25, 2016, https://acrl.ala.org/techconnect/post/store-and-display-high-resolution-images-with-the-international-image-interoperability-framework-iiif/
- A. Salarelli, International Image Interoperability Framework (IIIF): una panoramica, in «JLIS.it», 8, 1 (January 2017), pp. 50-66, DOI: http://dx.doi.org/10.4403/jlis.it-12090
- F. Cusimano, Biblioteche di conservazione & Data Curation: dal Custos catalogi al Digital Librarian. Il caso della Veneranda Biblioteca Ambrosiana, in «JLIS.it», 10, 1 (January 2019), pp. 125−139. DOI: http://dx.doi.org/10.4403/jlis.it-12513