Create bibliographies

This section explains how to prepare documents for use with refdb and how to generate and process the bibliographies. First we'll look at SGML and XML documents, then at LaTeX documents.

SGML and XML documents

DocBook and TEI SGML and XML documents and their refdb bibliographies share many features, so they are treated together in this section. We'll cover how you specify citations, how you generate the bibliography, and how you transform the final document.

Prepare the document

The output of the refdbib application will be a bibliography element that contains all required references. You can redirect the output into a file and include this file as an external entity at the spot where your bibliography should appear. To achieve this you need two modifications in your document:

  1. Extend the document type declaration at the beginning of your document to declare the external entity. The first example is from a DocBook SGML document, the second one shows a TEI XML document:

    <!DOCTYPE BOOK PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
    <!ENTITY bibliography "foo.bib.sgml">
    ]>
    ...
               
    
    <?xml version="1.0"?> 
    <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN" "http://www.tei-c.org/P4X/DTD/tei2.dtd" [
    <!ENTITY % TEI.general 'INCLUDE'>
    <!ENTITY % TEI.names.dates 'INCLUDE'>
    <!ENTITY % TEI.linking 'INCLUDE'>
    <!ENTITY % TEI.XML 'INCLUDE'>
    <!ENTITY bibliography SYSTEM "refdbtest.bib.xml">
    ]>
    ...
               
    

    The name of the entity is of course yours to choose, but using "bibliography" as in this example is pretty descriptive.

  2. Include the bibliography at the desired spot:

    ...
    &bibliography;
    ...
               
    

    You need to make sure that the included chunk of text is valid at the point where you want to include it. DocBook SGML and XML bibliographies are generated as bibliography elements, TEI XML bibliographies are wrapped in div elements.

Create citations

Creating citations and bibliographies in SGML or XML documents with refdb is very similar to what you would do if you had to manually code the bibliographies - but without the sweat. First you create the citations. Each citation consists of one or more bibliographic references in the text, each of which points to one particular entry in the bibliography. Then you create a bibliography for all cited publications (and possibly some more). For an increased benefit you would certainly also want to create functional links from the citations to the corresponding bibliography entries, which would act as hyperlinks in suitable output formats like HTML or PDF. In real life, you would probably jump back and forth, adding a bibliography entry whenever you add a new citation, and invent suitable ID values for your bibliographic link targets as needed.

Note: The distinction made here between a citation and a bibliographical reference may sound like nitpicking, but it will be important when we deal with citations that contain more than one bibliographical reference.

refdb requires a slightly more formalized approach. You have to stick to a particular syntax when you create the citations, but the good news is that refdb does almost all of the rest. You will usually also create the citations first and let refdb create the bibliography just before you are ready to transform the first draft.

The particular syntax of citations and bibliographic references is necessary for two reasons: first we have to tell refdb which bibliographic database entry (and probably, from which database) we want to reference. Second, we need to encode which type of citation or reference we want. The exact markup depends on the DTD that your document uses, but the basics are the same.

In both DocBook and TEI documents, these two bits of information are encoded in attributes of elements that create a link from the reference to the bibliographic entry. In order to handle multiple citations correctly, these link elements need to be inside a wrapper element. For a DocBook document, basic citations therefore look like this:

<citation role="REFDB">      (1)
<xref linkend="ID1X">         (2)
</citation>
<citation role="REFDB">
<xref linkend="LITIBP-ID2X">  (3)
</citation>
       
(1)
The citation element is a wrapper for one or more bibliographic references. The role attribute is set to REFDB to distinguish this citation from other citation elements that refdb should leave alone. Each citation element can contain one or more xref elements.
(2)
Each xref element specifies one bibliographic reference. The value of the linkend attribute encodes which bibliographic item is referenced (in this case, the database entry with the ID 1) and how the reference should be rendered (see below). It consists of the string "ID" followed by the numerical database entry ID, and a trailing one-letter type specifier ("X" in this case). This simple form does not encode the database from which the reference is to be pulled. When generating the bibliography, you will specify a default database from which all references without an explicit database label will be taken from. This form is most convenient if all your bibliographic items are stored in one database.
(3)
This xref element shows the syntax when an explicit database (LITIBP in this case) is specified. The attribute value consists of the database name, a dash, the string "ID", the numerical database entry ID, and the trailing type specifier. This form is mandatory only if you reference bibliographic entries from different databases in the same document (again, one database can be set as the default database in subsequent processing steps, so you could use the simple form for all references to entries in that particular database).

Note: This and the following DocBook examples are given in SGML notation. Keep in mind two things when working with XML documents:

  • The empty xref elements need a closing slash as in <xref linkend="ID2X"/>.

  • All attribute values relevant to refdb must be in uppercase. This restriction is imposed by the way citations are currently extracted from the document. It may be dropped in later versions though.

The corresponding syntax in a TEI XML document looks like this:

<seg type="REFDBCITATION" part="N" TEIform="seg">      (1)
<ptr targOrder="U" target="ID1X" TEIform="ptr"/>         (2)
</seg>
<seg type="REFDBCITATION" part="N" TEIform="seg">
<ptr targOrder="U" target="LITIBP-ID2X" TEIform="ptr"/>  (3)
</seg>
       
(1)
The general-purpose seg element with the type attribute set to REFDBCITATION is the citation wrapper for one or more bibliographic references.
(2)
Each bibliographic reference is specified by a ptr element whose target attribute encodes the bibliographic entry that is referenced. As explained in the DocBook example, this is the simple form that does not specify the database.
(3)
This is the corresponding bibliographic reference with the database specified.

Note: You don't have to worry about the attributes in the example which are not mentioned in the explanations. These are TEI default attributes which do not have anything to do with refdb (your XML editor will most likely create them automatically for you).

There are several ways to render citations and bibliographic references in the text. You select what you need by a trailing capital letter after the database ID (the "X" in the above examples). refdb will create several preformatted strings in the bibliography file which can be linked to by selecting the proper postfix. These preformatted strings have several purposes, as shown in the following table:

Table 9-1. Bibliographic reference types

Postfix Purpose
X The most common case. This is the first occurrence of a reference which is to be displayed outside the flow of the text. In numerical citation schemes this will be something like "(2)", in author-year citation schemes this may be rendered as "(Miller et al., 1992)".
S This is the same as X, but for a subsequent occurrence of the same reference. This distinction is important for some author-year citation schemes that print the full (or at least a longer) author list at the first occurrence and an abbreviated one at all subsequent occurrences of the same reference.
A This is the first occurrence of a reference that displays the authorlist inside the flow of the text, like in "Miller et al. reported recently (2001)...".
Q This is the same as A, but for subsequent occurrences of the same reference.
Y This type complements the author-only references mentioned above. In numerical citation schemes this is usually rendered like a normal reference, e.g. as "(2)", but in author-year citation schemes usually only the publication date is rendered, as in "(2001)".

Note: The exact formatting of these references, e.g. which citation style is used or which brackets surround the reference, is controlled by the style specification for a particular publication or publisher. This takes effect when you generate the bibliography and transform the final document.

An additional twist comes into play if you have multiple citations, i.e. a citation that contains more than one bibliographic reference. In most cases, all references are displayed inside of one pair of brackets. Some numerical citation styles require that bibliographic references with consecutive numbers be formatted as ranges within the same citation.

Note: Formatting consecutive numbers as ranges kills the links from the reference to the bibliographic item for each reference that make up a range. Any generated hyperlinks will therefore point to one common target for all members of a multiple citation. If this is not desired (e.g. to keep the links alive in a HTML presenation of a scientific document), you may override this behaviour during the transformation of the final document.

In order to format these cases properly, you need to include a dummy element whose sole purpose is to provide a link to an element that contains the combined, preformatted citation string. This is shown for a DocBook document in the following example.

<citation role="REFDB">
<xref endterm="THEFIRST" linkend="ID1" role="MULTIXREF"> (1)
<xref linkend="ID1X"> (2)
<xref linkend="ID14X">
<xref linkend="ID7X">
</citation>
       
(1)
This is the additional xref element which is mandatory in multiple citations. The linkend specifies the target of a link, which by convention could be the first of the following references. Note that the attribute value does not have a trailing type specifier. The element must have a role attribute with the value MULTIXREF. You also have to provide an unique value for the endterm attribute. This specifies the ID value that will be used in the corresponding element in the refdb-generated bibliography that contains the preformatted string for the multiple citation.
(2)
This and the following xref elements define the actual references that comprise the multiple citation.

Note: The sequence of the xref elements that encode the actual references may be important. Depending on the bibliography style used for the document transformation, the references may be displayed in the sequence as they were entered, or they may be rearranged according to the sequence of the bibliographic entries in the finished bibliography.

Keep also in mind that all attribute values must be in uppercase for the same reasons as stated above.

The corresponding TEI citation is a little bit simpler:

<seg type="REFDBCITATION" part="N" TEIform="seg">
<ptr type="MULTIXREF" targOrder="U" target="THEFIRST" TEIform="ptr"/>         (1)
<ptr targOrder="U" target="ID1X" TEIform="ptr"/>         (2)
<ptr targOrder="U" target="LITIBP-ID21X" TEIform="ptr"/>
<ptr targOrder="U" target="ID5X" TEIform="ptr"/>
</seg>
       
(1)
This is the additional ptr element which is mandatory in multiple citations. The element must have a type attribute with the value MULTIXREF. You also have to provide an unique value for the target attribute. This specifies the ID value that will be used in the corresponding element in the refdb-generated bibliography. In contrast to DocBook elements, there is no way to specify where a link should point to. The refdb XSL stylesheets will use the first bibliographic entry referenced in a multiple citation as the link target.
(2)
This and the following xref elements define the actual references that comprise the multiple citation.

Generate the bibliography

Unless you have good reasons not to do so, you should use the runbib shell script to generate the bibliography. This script greatly simplifies this task and offers a common interface for all supported document types. The following subsection will explain the use of this script. If you like to do it the hard way (or if you want to peek under the hood) you'll find a few explanations further down how to do this.

Use runbib

Lets assume you have a DocBook SGML document mypaper.sgml and want to submit it to the "Journal of Irreproducible Results". We further assume that the bibliography style for this famous periodical is stored in your database under the name "J.Irrep.Res." (see Manage bibliography styles to learn how it gets there). All your bibliography entries (at least those referenced without an explicit database name) are stored in the database mybib. Start the script from the directory that contains your document with the following command:

~$ runbib -d mybib -S "J.Irrep.Res." -t db31 foo.sgml

For a similar TEI XML document bar.xml you would run:

~$ runbib -d mybib -S "J.Irrep.Res." -t teix bar.xml

In both cases you will end up with a bibliography file (foo.bib.sgml and bar.bib.xml, respectively) as well as with a stylesheet (J.Irrep.Res.dsl) or a set of stylesheets (J.Irrep.Res.fo.xsl and J.Irrep.Res.html.xsl), respectively.

Note: Don't worry if you are greeted by a list of (Open)Jade errors complaining about missing elements when you first run this script on a particular document. Your document contains a number of crosslinks that point to elements that do not exist yet - you use runbib precisely to create these elements (you thus face a classic bootstrapping problem). As soon as the bibliography is created, these error messages should go away. Later you will only get an error message for each bibliographic entry that was added since the last time you ran runbib.

Do it the hard way

The following steps do exactly what the runbib script does, just with more to type. The only benefit of the hard way is that you have a chance to fiddle with the intermediate XML file which contains the list of bibliographic entries that should go into the bibliography. You can add further entries to extend the bibliography if you want to include uncited publications. The following procedure was written with a DocBook SGML document in mind, but transferring the commands to XML documents is straightforward. However, when working with XML documents there are additional steps required as outlined below.

  1. Extract the list of bibliographic references

    Use Jade or OpenJade with the citations.dsl stylesheet to create a list of the reference IDs (provide full paths as needed):

    #~ jade -t sgml -d citations.dsl /usr/lib/sgml/declaration/docbook-3.1.dcl foo.sgml > foo.id.xml
    

    Be prepared for a long list of "missing ID" error messages. This is due to the fact that the elements with the IDs that the xref elements in the citations point to do not yet exist, they will be generated in the refdb bibliography output. If you process documents with more than 200 citations, you'll have to increase the maximum error limit of Jade in order to obtain all IDs the first time. After the first complete pass (including the steps outlined below), Jade will only complain about any additional citations that you have inserted since the last run.

    The output is a simple XML file that contains the information about all citation and xref elements with their relevant attributes. It is absolutely legal to extend this file with additional citation elements to specify references which are not cited but nonetheless should appear in the bibliography.

    Unfortunately, both Jade and OpenJade don't get that Doctype line quite correct. Both forget to insert a space between the public and the system identifier, thus leaving you with a not well-formed document. Fire up your favourite editor and fix this line manually (insert a space between the two consecutive quotation marks on line 2).

    If you edit this intermediate XML file (that is, if you do more than just fixing the Doctype line), you should make sure that the result is still valid according to the CitationList XML DTD. refdb uses a non-validating parser to read this file so deviations from the DTD may slip through undetected and may have undesired consequences. The intermediate XML file carries the SYSTEM identifier of the CitationList XML DTD in the document type declaration. You may have to adapt the stylesheet citations.dsl to use the correct path for your local system.

    The following command line can be used to validate the document with (o)nsgmls (change the path to the XML declaration as necessary):

    ~$ nsgmls -wxml -s /usr/lib/sgml/declaration/xml.dcl foo.id.xml
    
  2. Create the bibliography file

    ~$ refdbib -d mybib -S "J.Irrep.Res." foo.id.xml > foo.bib.sgml
    

    This assumes that your reference database is called "mybib" and that you try to publish your paper in a journal that accepts the style with the name "J.Irrep.Res.".

    In addition to the bibliography file, refdbib will also create a DSSSL script containing the style specification. This file is a customized driver file for the RefDB-DocBook driver files and provides a couple of variable values specific for the given bibliography style.

  3. Post-processing

    This step is only required for XML documents. First we have to bring the stylesheets into shape, and if it is a TEI document, we'll also have to transform the bibliography file itself.

    refdbib creates a general-purpose XSL stylesheet which we need to turn into one FO and one HTML stylesheet. Create two copies of the file. If the stylesheet was e.g. J.Biol.Chem.xsl, you need one copy named J.Biol.Chem.fo.xsl and one copy named J.Biol.Chem.html.xsl. Scan the files for an import statement whose href attribute is surrounded with two "<!-- REFDBSTYLESHEET -->" comments. The value of this attribute must be set to the full path of the corresponding original stylesheet (DocBook FO or HTML, or TEI FO or HTML).

    If you're working on a TEI XML document, you'll have to transform the bibliography file itself. This is a DocBook SGML document and can be transformed easily with Jade/OpenJade and the bibdb2tei.dsl stylesheet.

Transform the document

Finally you can transform the document to create printable or HTML output. In order to get the formatting of the citations and bibliography entries right you have to use the refdb driver files for the DocBook or TEI stylesheets.

In addition to the general modifications of these driver files we'll have to apply modifications specific for the particular reference style. Therefore you have to specify the DSSSL or XSL style specification file that was created in the previous step. For your convenience it is recommended to use the supplied refdbjade and refdbxml scripts for DSSSL and XSL transformations, respectively, which were designed for this task:

~$ refdbjade -t html -s J.Irrep.Res.dsl foo.sgml
~$ refdbxml -t pdf -s J.Irrep.Res.fo.xsl bar.xml

If you want to change the bibliography style of your document, all you need to do is to rerun runbib and refdbjade or refdbxml with the new parameters. No changes to your DocBook source are necessary.

Note: If you want to create a bibliography for each part of a book or for each chapter, the procedure is not much different. The simplest approach is to keep the parts or chapters in individual files and process these individually as described above for the whole document. You'll get several bibliography files that you can include into the corresponding document source files.

Other SGML or XML document types

While refdb works out of the box with DocBook SGML/XML and TEI XML documents, it is by no means limited to these document types. The only native bibliography export format is DocBook (in a form suitable for both SGML and XML documents). TEI bibliographies are actually generated from this output with a SGML-to-XML transformation. The DocBook output has a sufficient granularity to allow this and possibly other transformations. If you want refdb to work with other document types, you have to do the following:

LaTeX/BibTeX

refdb integrates quite nicely with the LaTeX/BibTeX system. If you previously used a flat text file to store your BibTeX references, you will notice that there is only one additional command to run when you process your source document. Instead of keeping all of your references in a text file, refdbib will retrieve only the required references from the SQL database and store them in an intermediate text file.

  1. Prepare the document

    Use the LaTeX commands cite and nocite to include the references as usual. The extended commands from the natbib package should work as well. All these commands take an identifier for the reference as an argument. These reference definitions can come in two flavours just like in DocBook documents: Either you use the same database for all references in the text. Then you just specify the ID of the reference and tell the processing application which database to use. Or you specify the database name with each reference. In this case, you can pull the references from different databases in the same document. The two versions look like this:

    &bsol;cite{ID1}
    &bsol;cite{litibp-ID2}
             
    

    The first version cites the reference with the identifier "1" in the database passed to the processing application as an argument. The second form cites the reference with the identifier "2" in the database "litibp".

    The LaTeX &bsol;bibliography command takes as an argument the name of the intermediate bibliography file without the extension. A simple choice would be the basename of your LaTeX document.

    Note: Keep in mind that even if you pull references from different refdb databases, you still need to specify only one reference database in your LaTeX document as refdb consolidates all cited references into one bibliography file.

  2. Create the auxiliary file

    Run the latex interpreter with the basename of your document (foo.tex) as an argument:

    #~ latex foo
    

    latex will create, among other files, foo.aux. latex stores all sorts of information in these auxiliary files for later use in subsequent runs. The interesting part for us is the list of citations.

  3. Create the intermediate bibliography file

    Now refdb enters the stage. We process the auxiliary file to create a BibTeX bibliography tailored to our document. Either we do it manually:

    #~ refdbib -d mybib -S name -t bibtex foo.aux > foo.bib
    

    Or we use the runbib shell script:

    #~ runbib -d mybib -S name -t bibtex foo
    

    Remember that the basename of the file that receives the bibliographic information (foo.bib in our example) must match the name given in the bibliography command in the LaTeX document.

    The resulting bibliography file will contain all references that were requested from the LaTeX document. If you add more citations to this document, you have to run refdbib again to update the intermediate bibliography file (it won't hurt if you remove citations from your LaTeX document, though).

    Note: For the sake of consistency with bibtex, it is possible (though not necessary) to specify the auxiliary file without the .aux extension (foo in the above example).

  4. Run bibtex

    From here, everything runs as you are used to from LaTeX/BibTeX:

    #~ bibtex foo
    
  5. Run latex

    Run latex on your LaTeX document at least twice to get all references right:

    #~ latex foo && latex foo