At the moment, pybtex is not very documented. The first source of documentation, apart from the source code itself, is pybtex's home page and especially The Friendly Manual of Pybtex.
I'm not good enough at Python as to extract from pybtex's source code all the knowledge necessary to write scripts based on pybtex. But I did take a look at the source code, I did some experiments, and this page is a summary of what I understood about how to use pybtex. Of course, most of this page comes from guessing, so I might be utterly wrong about a number of things.
For comments or corrections, see my contacts on my home page.This is what the pybtex bibliographical entry data type looks like (the indent is mine):
BibliographyData (entries=OrderedCaseInsensitiveDict ({ 'adams93': Entry ( 'article', fields={ 'volume': '4', 'title': 'The title of the work', 'journal': 'The name of the journal', 'number': '2', 'month': '7', 'note': 'An optional note', 'year': '1993', 'pages': '201-213'}, persons={ 'author': [Person(u'Adams, Peter')]} ) }), preamble=[] )
This is how you initialize the parser:
from pybtex.database.input import bibtex parser = bibtex.Parser() bib_data = parser.parse_file('myfile.bibtex')
At this point, bib_data
is an object including the whole bibliography
bib_data.entries
is, I believe, an iterable tuple (but I'm not sure). What I'm sure of is that it's iterable:
The following code
for e in bib_data.entries: print(bib_data.entries[e])
prints the entries (unformatted) one by one.
Each element of the previous for
loop is, in its turn, a dictionary
bib_data.entries[e]
where
e
|
is the bibtex key (e.g. adams93 ), while |
entries[e]
|
is a 'entry' object, that maybe is a tuple (judging from the round parenthesis in the 'Entry' element above). |
But honestly I didn't understand how this entries[e]
tuple works. In fact, the code
print(bib_data.entries[e][0])
doesn't work. It raises the error
TypeError: 'Entry' object does not support indexing
.
However, this 'Entry' object has some properties, that are described below.
It's the entry type (electronic, book, article, incollection).
It's a dictionary including all fields except 'author' or 'editor'. Pybtex manages the conentent of the 'author' or 'editor' field not as a string, but as a list of 'Person' objects.
Also the type of the entry (book, article, etc.) is not included in the fields
dictionary.
The keys of the fields
dictionary are
'title'
, 'year'
(case insensitive) etc.
The code
bib_data.entries[e].fields[u'year']
yields something like '1993' (a string).
It's an object of the type
OrderedCaseInsensitiveDict
, i.e. a specific kind of dictionary
'editor'
or 'author'
, while
Basically,
bib_data.entries[e].persons
is a dictionary, so
print(bib_data.entries[e].persons[u'Editor'])
yields something like
[Person(u'Trevisan, M.'), Person(u'Gigliozzi, G.')]
However, it yields a
KeyError
if the bibliographical entry does not include an 'editor'
field but, for example, an 'author'
field (or neither 'editor'
or 'author
').
It's a list with an element for each first name of author/editor. If there's one author/editor onliy, it's still a list, including one element only.
I'm not sure of what pybtex does when there are many first names (e.g. George Walker Bush) or many last names. I know for sure pybtex considers cases like "C.M." (no space in between) or "Jean-Daniel" (middle dash, no space in between) one Unicode string ("Jean-Daniel" is not split into "Jean" and "Daniel).
The following code yields lists with other parts of the name:
bib_data.entries[e].persons['editor'].middle()
bib_data.entries[e].persons['editor'].last()
bib_data.entries[e].persons['editor'].von()
bib_data.entries[e].persons['editor'].junior()
As the aforementioned pieces of code return lists (not stringes), if you want to print a name, you need to use join()
, like this:
print(''.join(p.first()),''.join(p.last()))
What if you want to search among names?
if 'Monella' in p.last()
matches 'Monella' if 'Monella' is the whole family name, but
if 'ella' in p.last()
doesn't match 'Monella'! In this case you have to use:
if 'ella' in ''.join(p.last()):
which matches both 'Monella' and 'Stella'.
I believe (though I'm not sure at all) that it's a tuple, in which:
items()[0]
|
is the bibtex key (e.g. adams93 . As we saw above, you can get that key also with:
for e in bib_data.entries: print eThis 'for' loop will print a list of all bibtex keys in the bibliography. |
items()[1]
|
is the whole 'Entry' object (for which see above), which in its turn includes three objects:
|
From my experiments, it seems that there is no
items()[2]
The following two pieces of code return exactly the same output (that is a list of entries separated by blank lines and lines with '---'):
for i in bib_data.entries.items(): print(i[0]) # returns the bibtex key print(i[1]) # returns the Entry object print('\n\n---\n\n')
for e in bib_data.entries: print(e) # returns the bibtex key print(bib_data.entries[e]) # returns the Entry object print('\n\n---\n\n')
This is the output (for each bibliographical entry) of the two previous pieces of code:
2006open Entry(u'electronic', fields=FieldDict([(u'Title', u'Open Source Critical Editions'), (u'Language', u'ENG'), (u'Note', u"Online proceedings of the a Workshop/Workgroup organised and supported by Methods Network, Perseus, and the Digital Classicist, held on Friday 22nd September 2006 at Centre for Computing in the Humanities, King's College London."), (u'Url', u'http://wiki.digitalclassicist.org/OSCE_Programme'), (u'Year', u'2006'), (u'__markedentry', u'[ilbuonme:1]'), (u'Keywords', u'Babel,DigiClass,Importante,Letto,Why'), (u'Owner', u'ilbuonme'), (u'Timestamp', u'2012.05.30')]), persons=OrderedCaseInsensitiveDict([]))
import pybtex.database.input.bibtex import pybtex.plugin import codecs import latexcodec style = pybtex.plugin.find_plugin('pybtex.style.formatting', 'plain')() backend = pybtex.plugin.find_plugin('pybtex.backends', 'latex')() parser = pybtex.database.input.bibtex.Parser() with codecs.open("test.bib", encoding="latex") as stream: # this shows what the latexcodec does to the source print stream.read() with codecs.open("test.bib", encoding="latex") as stream: data = parser.parse_stream(stream) for entry in style.format_entries(data.entries.itervalues()): print entry.text.render(backend)
I created a script based on this code. I added more HTML markup, the HTML header etc.
run_bibtex.py
:
http://bazaar.launchpad.net/~pybtex-devs/pybtex/trunk/view/head:/pybtex/tests/run_bibtex.py, including a number of practical functions, but there are also other useful test scripts on that website (e.g. to format person names)./usr/local/lib/python2.7/dist-packages
and
/usr/lib/python2.7/dist-packages
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.