Paolo Monella's do-it-yourself Pybtex cheatsheet

Python

About this cheatsheet

At the moment, pybtex is not very documented. The first source of documentation, apart from the source code itself, is pybtex's home page and especially The Friendly Manual of Pybtex.

I'm not good enough at Python as to extract from pybtex's source code all the knowledge necessary to write scripts based on pybtex. But I did take a look at the source code, I did some experiments, and this page is a summary of what I understood about how to use pybtex. Of course, most of this page comes from guessing, so I might be utterly wrong about a number of things.

For comments or corrections, see my contacts on my home page.

The bibliographical entry data type

This is what the pybtex bibliographical entry data type looks like (the indent is mine):

BibliographyData
(entries=OrderedCaseInsensitiveDict
  ({
        'adams93':  Entry  (
            'article',
            fields={
                'volume': '4',
                'title': 'The title of the work',
                'journal': 'The name of the journal',
                'number': '2',
                'month': '7',
                'note': 'An optional note',
                'year': '1993',
                'pages': '201-213'},
            persons={
                'author': [Person(u'Adams, Peter')]}
          )
    
  }), preamble=[]

)

Parse file

This is how you initialize the parser:

    from pybtex.database.input import bibtex
    parser = bibtex.Parser()
    bib_data = parser.parse_file('myfile.bibtex')

At this point, bib_data is an object including the whole bibliography

bib_data.entries

bib_data.entries is, I believe, an iterable tuple (but I'm not sure). What I'm sure of is that it's iterable:

The following code

for e in bib_data.entries:
	    print(bib_data.entries[e])

prints the entries (unformatted) one by one.

bib_data.entries[e]

Each element of the previous for loop is, in its turn, a dictionary

bib_data.entries[e]

where

e is the bibtex key (e.g. adams93), while
entries[e] is a 'entry' object, that maybe is a tuple (judging from the round parenthesis in the 'Entry' element above).

But honestly I didn't understand how this entries[e] tuple works. In fact, the code print(bib_data.entries[e][0]) doesn't work. It raises the error TypeError: 'Entry' object does not support indexing. However, this 'Entry' object has some properties, that are described below.

Some properties of the 'Entry' object

bib_data.entries[e].type

It's the entry type (electronic, book, article, incollection).

bib_data.entries[e].fields

It's a dictionary including all fields except 'author' or 'editor'. Pybtex manages the conentent of the 'author' or 'editor' field not as a string, but as a list of 'Person' objects.

Also the type of the entry (book, article, etc.) is not included in the fields dictionary.

The keys of the fields dictionary are 'title', 'year' (case insensitive) etc.

The code bib_data.entries[e].fields[u'year'] yields something like '1993' (a string).

bib_data.entries[e].persons

It's an object of the type OrderedCaseInsensitiveDict, i.e. a specific kind of dictionary

Basically, bib_data.entries[e].persons is a dictionary, so

print(bib_data.entries[e].persons[u'Editor'])

yields something like

[Person(u'Trevisan, M.'), Person(u'Gigliozzi, G.')]

However, it yields a KeyError if the bibliographical entry does not include an 'editor' field but, for example, an 'author' field (or neither 'editor' or 'author').

bib_data.entries[e].persons['editor'].first()

It's a list with an element for each first name of author/editor. If there's one author/editor onliy, it's still a list, including one element only.

I'm not sure of what pybtex does when there are many first names (e.g. George Walker Bush) or many last names. I know for sure pybtex considers cases like "C.M." (no space in between) or "Jean-Daniel" (middle dash, no space in between) one Unicode string ("Jean-Daniel" is not split into "Jean" and "Daniel).

The following code yields lists with other parts of the name:

bib_data.entries[e].persons['editor'].middle()
bib_data.entries[e].persons['editor'].last()
bib_data.entries[e].persons['editor'].von()
bib_data.entries[e].persons['editor'].junior()

As the aforementioned pieces of code return lists (not stringes), if you want to print a name, you need to use join(), like this:

print(''.join(p.first()),''.join(p.last()))

What if you want to search among names?

if 'Monella' in p.last()

matches 'Monella' if 'Monella' is the whole family name, but

if 'ella' in p.last()

doesn't match 'Monella'! In this case you have to use:

if 'ella' in ''.join(p.last()):

which matches both 'Monella' and 'Stella'.

bib_data.entries[e].items

I believe (though I'm not sure at all) that it's a tuple, in which:

items()[0] is the bibtex key (e.g. adams93. As we saw above, you can get that key also with:
for e in bib_data.entries:
    print e   
			
This 'for' loop will print a list of all bibtex keys in the bibliography.
items()[1] is the whole 'Entry' object (for which see above), which in its turn includes three objects:
  • the entry type (electronic, book, article etc.)
  • the fields
  • the persons

From my experiments, it seems that there is no items()[2]

The following two pieces of code return exactly the same output (that is a list of entries separated by blank lines and lines with '---'):

  1. for i in bib_data.entries.items():
        print(i[0])    # returns the bibtex key
        print(i[1])    # returns the Entry object
        print('\n\n---\n\n')
    	
  2. for e in bib_data.entries:
        print(e)                    # returns the bibtex key
        print(bib_data.entries[e])  # returns the Entry object
        print('\n\n---\n\n')
    	

This is the output (for each bibliographical entry) of the two previous pieces of code:

2006open
Entry(u'electronic', fields=FieldDict([(u'Title', u'Open Source Critical Editions'), (u'Language', u'ENG'), (u'Note', u"Online proceedings of the a Workshop/Workgroup organised and supported by Methods Network, Perseus, and the Digital Classicist, held on Friday 22nd September 2006 at Centre for Computing in the Humanities, King's College London."), (u'Url', u'http://wiki.digitalclassicist.org/OSCE_Programme'), (u'Year', u'2006'), (u'__markedentry', u'[ilbuonme:1]'), (u'Keywords', u'Babel,DigiClass,Importante,Letto,Why'), (u'Owner', u'ilbuonme'), (u'Timestamp', u'2012.05.30')]), persons=OrderedCaseInsensitiveDict([]))

Export bibtex to HTML through pybtex

From http://stackoverflow.com/questions/19751402/does-pybtex-support-accent-special-characters-in-bib-file I got the following piece of code:
import pybtex.database.input.bibtex import pybtex.plugin
import codecs
import latexcodec

style = pybtex.plugin.find_plugin('pybtex.style.formatting', 'plain')()
backend = pybtex.plugin.find_plugin('pybtex.backends', 'latex')()
parser = pybtex.database.input.bibtex.Parser()
with codecs.open("test.bib", encoding="latex") as stream:
    # this shows what the latexcodec does to the source
    print stream.read()
with codecs.open("test.bib", encoding="latex") as stream:
    data = parser.parse_stream(stream)
for entry in style.format_entries(data.entries.itervalues()):
    print entry.text.render(backend)

I created a script based on this code. I added more HTML markup, the HTML header etc.

Useful links

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.