Daniele Fusi Paolo Monella LOD Workshop Tutorial Venice Summer School in Digital and Public Humanities, Session «Linking the Data: Artists, Artworks and the Semantic Web (LOD)»
Summer School
In short: what you will do
- Step 1
- Start from a TEI XML file in which P. Picasso, G. Braque and the city of Paris are mentioned, but in which the semantic markup regarding people is incomplete
- Step 2
- Complete the semantic markup by including TEI tags about people (external LOD URIs are still missing)
- Step 3
- Find out LOD URIs for these three entities on DBPedia
- Step 4
- Add those URIs to the semantic markup of TEI XML file
- Step 5
- Upload the updated TEI XML file to the toy app
- Step 6
- Re-generate the HTML visualization based on the updated XML
- Step 7
- (Re)-run the Parse XML entities function of the toy app, which pulls data about those three entities from the Semantic Web (namely, from DBPedia)
Step 1
What you are doing
You start from a TEI XML file in which P. Picasso, G. Braque and the city of Paris are mentioned, but in which the semantic markup regarding people is incomplete.
Why you are doing it
This is the base file that you will markup semantically in the next steps. The scenario we are mimicking is a very common one: imagine that, in your research project, you have a TEI-encoded text that you want to enrich with semantic markup.
How to do it
- Download the Step 1 version of the
TEI XML file (
right click / save as
, or similar) - Save the file in a folder of your computer where you will be able to find it later
- Open it with your XML editor to check that the download worked
- Keep it open in your XML editor, for the next step
What to do if anything goes wrong
You need this starting file for the next steps. So call on one of us, and we’ll download it together.
Step 2
What you are doing
You are adding the TEI markup relative to people (Picasso, Braque) where it is missing.
With this markup, in the <teiHeader>
we ‘define’
people
(with <person>
elements). Then, in the
<text>
, when we find the name of a person (e.g. in
the sentence
Die Auffassung beispielsweise Picasso, Braque...
), then we
mark that name with <name>
and a @ref
attribute that points to the definition of the relevant
<person>
in the <tehiHeader>
.
More specifically:
- The
<teiHeader> / <fileDesc> / <listPerson> / <person>
element for Picasso is already in the file- You add the
<teiHeader> / <fileDesc> / <listPerson> / <person>
element for Braque (without the actual LOD URIs from DBPedia and VIAF, at this step)
- You add the
- The
<text> / <body> / ... / <name>
elements for Picasso and Braque in the English translation are already in the file- You add the
<text> / <body> / ... / <name>
elements for Picasso and Braque in the German original text
- You add the
Why you are doing it
Because this ‘internal’ semantic markup strategy, pointing from
<name>
to <person>
within your TEI document, is a common
practice.
A side note: the markup you are creating can be defined as
‘semantic’, since it refers to entities of the (supposedly) real world
(people, places etc.). Yet, it’s not yet connected to the Semantic Web
(LOD) external to your document. For this, you need to link our
<person>
elements to the
Semantic Web by using LOD URIs: you will do this in the next steps.
How to do it
Step 2/A
- Open your initial TEI XML document
- Scroll down to
<teiHeader> / <fileDesc> / <listPerson> / <person xml:id="p_picasso_pablo">
- Take a look at the markup strategy (ask us any questions)
- Create another
<person>
element for Georges Braque just after the one for Pablo Picasso (where the comment<!-- insert-braque-person-element-after-this-comment -->
is). Use the Picasso<person>
element as a model (copy/paste, then edit) - As value of the
@xml:id
attribute of<person>
, use:xml:id="p_braque_georges"
(id
’s are arbitrary, but the<name>
element further down in the TEI file will point to this specificxml:id
) - Type the relevant textual content into the
<forename>
and<surname>
elements for Georges Braque - As you see, the
<idno>
elements include placeholders (insert-...-here
). Leave them as they are, since we’ll replace those placeholders in the next steps:
<idno type="dbpedia">insert-dbpedia-uri-here</idno>
<idno type="VIAF">insert-viaf-id-here</idno>
- → Save the XML file
Step 2/B
- Scroll down to the
<text> / <body> / ... / <name>
elements for Picasso and Braque in the English translation. The relevant text is: “The view, for example, of Picasso, Braque, and today’s School of Paris” - Note the markup strategy (ask us any questions)
- Apply the same markup to the corresponding German original text (use the markup in the English translation as model; copy/paste tags, then edit if needed). The relevant text is: “Die Auffassung beispielsweise bei Picasso, Braque und den jetzigen Parisern”
- Topic for discussion: in the original German text, Klee does not say
“and today’s school from Paris” but “und den jetzigen Parisern”
(literally, “and today’s Parisians”), i. e. the group of
inhabitants/people, not the city per se. Should our semantic
markup point to a
<persGroup>
(as we do in the German text) or a<place>
element (as we do in the English translation)? Why? - Save the XML file
Shorter/longer version
- Longest version
-
Insert both
<person>
(step 2/A) and<name>
(step 2/B) elements - Shorter version
-
Only insert one of them, i.e. either
<person>
(step 2/A) or one of the<name>
elements (step 2/B) - Shortest version
- Skip this step and see What to do if anything goes wrong below
What to do if anything goes wrong
Download the Step 2 version of the
TEI XML file (right click / save as
), in which all
<person>
and <name>
markup has
been already included, so you are ready for the next steps.
Please note where you downloaded this file. You can replace the previous version.
Step 3
What you are doing
You find out the following LOD URIs:
- For Picasso and Braque,
- DBPedia URI
- and VIAF id
- For Paris
- DBPedia URI only
Why you are doing it
So, in the next steps, you can replace the placeholders with the relevant URIs, in the TEI XML file (thus linking your ‘internal’ semantic markup to the Semantic Web).
How to do it
“I’m feeling lucky” approach
- DBPedia URIs look like
https://dbpedia.org/resource/School_of_Paris
- Replace
School_of_Paris
with the most relevant (capitalized English) word or phrase (with underscores), and cross your fingers
“That’s how it’s done, kid!” approach
- Go to https://dbpedia.org/sparql
- Paste a SPARQL query like this, after replacing
Paris
with the ‘label’ you are searching, and hit ‘Execute Query’:
SELECT ?uri ?label
WHERE {
?uri rdfs:label ?label .
filter(?label="Paris"@en)
}
“In real life” approach: Recogito
- Use a NER (Named Entity Recognition) tool such as Recogito
- Register by creating a free account, or login
- Upload your TEI XML document: click on ‘New’ (top left)
- Click on the file in the list
- Click on ‘Options’ (top right)
- Select Named Entity Recognition / Stanford CoreNLP en / Start NER (bottom)
- Then double-click on the file in the list to open it
- Click on highlighted words/phrases to review/edit the semantic markup
Shorter/longer version
- Longest version
- Find out all LOD URIs
- Shorter versions
- Only find out one or two, then go to the What to do if anything goes wrong section
What to do if anything goes wrong
Find the relevant URIs in the How to do it section of the next step and skip to the next step.
Step 4
What you are doing
Add the relevant URIs to your TEI markup (<person>
elements).
Why you are doing it
So the semantic markup of your TEI XML file is actually linked to the Semantic Web.
How to do it
- Find the
insert-dbpedia-uri-here
and theinsert-viaf-id-here
placeholders in the TEI XML file - Replace them with the relevant URIs, i.e.
Type | For entity | URI |
---|---|---|
DBPedia URI | for Picasso | http://dbpedia.org/resource/Pablo_Picasso |
VIAF id | for Picasso | 15873 |
DBPedia URI | for Braque | http://dbpedia.org/resource/Georges_Braque |
VIAF id | for Braque | 9867924 |
- → Save the XML file
Shorter/longer version
- Longer version
- Insert the URIs for all entities
- Shorter versions
- Only insert some URIs, then skip to the What to do if anything goes wrong section
What to do if anything goes wrong
Download the Step 3 version of the
TEI XML file (right click / save as
), in which all URIs
have been inserted, so you are ready for the next steps.
Please note where you downloaded this file. You can replace the previous version.
Step 5
What you are doing
You upload the updated TEI XML file to the toy app.
Why you are doing it
The toy app starts from a default version of our TEI XML file, that it finds in its server.
You now want to provide it with the updated version, so you can re-generate the HTML visualization and pull information from the Semantic Web in the next steps.
How to do it
- Make sure that you have a working, valid and complete TEI XML file
- If you’re not sure, or if anything is wrong or incomplete in your TEI XML file, download the Step 3 version of it
- Open the toy app with your browser and locate the button bar, just behind the XML and XSLT code windows:
- Upload the updated TEI XML file by clicking on the Load XML from file button:
What to do if anything goes wrong
Call on us.
Step 6
What you are doing
You re-generate the HTML visualization based on your updated XML, and the XSLT.
Why you are doing it
So you can see that the updated translation does not change in appearance: why is it?
How to do it
- Click on the Transform XML with XSLT button:
- Check the visualization that appears in the HTML window of the toy app
What to do if anything goes wrong
This step is not directly relevant to LOD: just skip to the next step.
Step 7
What you are doing
You (re)-run the “Parse XML entities” function of the toy app, which pulls data about those three entities from the Semantic Web (namely, from DBPedia).
Why you are doing it
So you can see the power of LOD in action!
How to do it
- Click on the Parse XML entities button:
- Check the entities list in the new window that has appeared just above the map
- Click on the Fly to this location button in the Paris row and check the map. Where does the geographical information (latitude, longitude) come from? Is it encoded in the TEI XML file?
- Click on the View details button for each entity and check the Details window that appears. Where does this information come from? Is it encoded in the TEI XML file?
- Switch the language of the abstract in the dropdown menu
- Does the system show any geographical information for people (Picasso and Braque)? Why?
What to do if anything goes wrong
Call on us.