Daniele Fusi Paolo Monella LOD Workshop Tutorial Venice Summer School in Digital and Public Humanities, Session «Linking the Data: Artists, Artworks and the Semantic Web (LOD)»

Summer School

Programme

In short: what you will do

Step 1
Start from a TEI XML file in which P. Picasso, G. Braque and the city of Paris are mentioned, but in which the semantic markup regarding people is incomplete
Step 2
Complete the semantic markup by including TEI tags about people (external LOD URIs are still missing)
Step 3
Find out LOD URIs for these three entities on DBPedia
Step 4
Add those URIs to the semantic markup of TEI XML file
Step 5
Upload the updated TEI XML file to the toy app
Step 6
Re-generate the HTML visualization based on the updated XML
Step 7
(Re)-run the Parse XML entities function of the toy app, which pulls data about those three entities from the Semantic Web (namely, from DBPedia)

Step 1

What you are doing

You start from a TEI XML file in which P. Picasso, G. Braque and the city of Paris are mentioned, but in which the semantic markup regarding people is incomplete.

Why you are doing it

This is the base file that you will markup semantically in the next steps. The scenario we are mimicking is a very common one: imagine that, in your research project, you have a TEI-encoded text that you want to enrich with semantic markup.

How to do it

  1. Download the Step 1 version of the TEI XML file (right click / save as, or similar)
  2. Save the file in a folder of your computer where you will be able to find it later
  3. Open it with your XML editor to check that the download worked
  4. Keep it open in your XML editor, for the next step

What to do if anything goes wrong

You need this starting file for the next steps. So call on one of us, and we’ll download it together.

Step 2

What you are doing

You are adding the TEI markup relative to people (Picasso, Braque) where it is missing.

With this markup, in the <teiHeader> we ‘define’ people (with <person> elements). Then, in the <text>, when we find the name of a person (e.g. in the sentence Die Auffassung beispielsweise Picasso, Braque...), then we mark that name with <name> and a @ref attribute that points to the definition of the relevant <person> in the <tehiHeader>.

More specifically:

Why you are doing it

Because this ‘internal’ semantic markup strategy, pointing from <name> to <person> within your TEI document, is a common practice.

A side note: the markup you are creating can be defined as ‘semantic’, since it refers to entities of the (supposedly) real world (people, places etc.). Yet, it’s not yet connected to the Semantic Web (LOD) external to your document. For this, you need to link our <person> elements to the Semantic Web by using LOD URIs: you will do this in the next steps.

How to do it

Step 2/A

  1. Open your initial TEI XML document
  2. Scroll down to <teiHeader> / <fileDesc> / <listPerson> / <person xml:id="p_picasso_pablo">
  3. Take a look at the markup strategy (ask us any questions)
  4. Create another <person> element for Georges Braque just after the one for Pablo Picasso (where the comment <!-- insert-braque-person-element-after-this-comment --> is). Use the Picasso <person> element as a model (copy/paste, then edit)
  5. As value of the @xml:id attribute of <person>, use: xml:id="p_braque_georges" (id’s are arbitrary, but the <name> element further down in the TEI file will point to this specific xml:id)
  6. Type the relevant textual content into the <forename> and <surname> elements for Georges Braque
  7. As you see, the <idno> elements include placeholders (insert-...-here). Leave them as they are, since we’ll replace those placeholders in the next steps:
            <idno type="dbpedia">insert-dbpedia-uri-here</idno>
            <idno type="VIAF">insert-viaf-id-here</idno>

Step 2/B

  1. Scroll down to the <text> / <body> / ... / <name> elements for Picasso and Braque in the English translation. The relevant text is: “The view, for example, of Picasso, Braque, and today’s School of Paris”
  2. Note the markup strategy (ask us any questions)
  3. Apply the same markup to the corresponding German original text (use the markup in the English translation as model; copy/paste tags, then edit if needed). The relevant text is: “Die Auffassung beispielsweise bei Picasso, Braque und den jetzigen Parisern”
  4. Topic for discussion: in the original German text, Klee does not say “and today’s school from Paris” but “und den jetzigen Parisern” (literally, “and today’s Parisians”), i. e. the group of inhabitants/people, not the city per se. Should our semantic markup point to a <persGroup> (as we do in the German text) or a <place> element (as we do in the English translation)? Why?
  5. Save the XML file

Shorter/longer version

Longest version
Insert both <person> (step 2/A) and <name> (step 2/B) elements
Shorter version
Only insert one of them, i.e. either <person> (step 2/A) or one of the <name> elements (step 2/B)
Shortest version
Skip this step and see What to do if anything goes wrong below

What to do if anything goes wrong

Download the Step 2 version of the TEI XML file (right click / save as), in which all <person> and <name> markup has been already included, so you are ready for the next steps.

Please note where you downloaded this file. You can replace the previous version.

Step 3

What you are doing

You find out the following LOD URIs:

Why you are doing it

So, in the next steps, you can replace the placeholders with the relevant URIs, in the TEI XML file (thus linking your ‘internal’ semantic markup to the Semantic Web).

How to do it

“I’m feeling lucky” approach

  1. DBPedia URIs look like https://dbpedia.org/resource/School_of_Paris
  2. Replace School_of_Paris with the most relevant (capitalized English) word or phrase (with underscores), and cross your fingers

“That’s how it’s done, kid!” approach

  1. Go to https://dbpedia.org/sparql
  2. Paste a SPARQL query like this, after replacing Paris with the ‘label’ you are searching, and hit ‘Execute Query’:
SELECT ?uri ?label
WHERE {
?uri rdfs:label ?label .
filter(?label="Paris"@en)
}

“In real life” approach: Recogito

  1. Use a NER (Named Entity Recognition) tool such as Recogito
  2. Register by creating a free account, or login
  3. Upload your TEI XML document: click on ‘New’ (top left)
  4. Click on the file in the list
  5. Click on ‘Options’ (top right)
  6. Select Named Entity Recognition / Stanford CoreNLP en / Start NER (bottom)
  7. Then double-click on the file in the list to open it
  8. Click on highlighted words/phrases to review/edit the semantic markup

Shorter/longer version

Longest version
Find out all LOD URIs
Shorter versions
Only find out one or two, then go to the What to do if anything goes wrong section

What to do if anything goes wrong

Find the relevant URIs in the How to do it section of the next step and skip to the next step.

Step 4

What you are doing

Add the relevant URIs to your TEI markup (<person> elements).

Why you are doing it

So the semantic markup of your TEI XML file is actually linked to the Semantic Web.

How to do it

  1. Find the insert-dbpedia-uri-here and the insert-viaf-id-here placeholders in the TEI XML file
  2. Replace them with the relevant URIs, i.e.
Type For entity URI
DBPedia URI for Picasso http://dbpedia.org/resource/Pablo_Picasso
VIAF id for Picasso 15873
DBPedia URI for Braque http://dbpedia.org/resource/Georges_Braque
VIAF id for Braque 9867924

Shorter/longer version

Longer version
Insert the URIs for all entities
Shorter versions
Only insert some URIs, then skip to the What to do if anything goes wrong section

What to do if anything goes wrong

Download the Step 3 version of the TEI XML file (right click / save as), in which all URIs have been inserted, so you are ready for the next steps.

Please note where you downloaded this file. You can replace the previous version.

Step 5

What you are doing

You upload the updated TEI XML file to the toy app.

Why you are doing it

The toy app starts from a default version of our TEI XML file, that it finds in its server.

You now want to provide it with the updated version, so you can re-generate the HTML visualization and pull information from the Semantic Web in the next steps.

How to do it

  1. Make sure that you have a working, valid and complete TEI XML file
  2. If you’re not sure, or if anything is wrong or incomplete in your TEI XML file, download the Step 3 version of it
  3. Open the toy app with your browser and locate the button bar, just behind the XML and XSLT code windows:
  4. Upload the updated TEI XML file by clicking on the Load XML from file button:

What to do if anything goes wrong

Call on us.

Step 6

What you are doing

You re-generate the HTML visualization based on your updated XML, and the XSLT.

Why you are doing it

So you can see that the updated translation does not change in appearance: why is it?

How to do it

  1. Click on the Transform XML with XSLT button:
  2. Check the visualization that appears in the HTML window of the toy app

What to do if anything goes wrong

This step is not directly relevant to LOD: just skip to the next step.

Step 7

What you are doing

You (re)-run the “Parse XML entities” function of the toy app, which pulls data about those three entities from the Semantic Web (namely, from DBPedia).

Why you are doing it

So you can see the power of LOD in action!

How to do it

  1. Click on the Parse XML entities button:
  2. Check the entities list in the new window that has appeared just above the map
  3. Click on the Fly to this location button in the Paris row and check the map. Where does the geographical information (latitude, longitude) come from? Is it encoded in the TEI XML file?
  4. Click on the View details button for each entity and check the Details window that appears. Where does this information come from? Is it encoded in the TEI XML file?
  5. Switch the language of the abstract in the dropdown menu
  6. Does the system show any geographical information for people (Picasso and Braque)? Why?

What to do if anything goes wrong

Call on us.