This section explains how to prepare documents for use with refdb and how to generate and process the bibliographies. First we'll look at SGML and XML documents, then at LaTeX documents.
DocBook and TEI SGML and XML documents and their refdb bibliographies share many features, so they are treated together in this section. We'll cover how you specify citations, how you generate the bibliography, and how you transform the final document.
The output of the refdbib application will be a bibliography element that contains all required references. You can redirect the output into a file and include this file as an external entity at the spot where your bibliography should appear. To achieve this you need two modifications in your document:
Extend the document type declaration at the beginning of your document to declare the external entity. The first example is from a DocBook SGML document, the second one shows a TEI XML document:
<!DOCTYPE BOOK PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ <!ENTITY bibliography "foo.bib.sgml"> ]> ... |
<?xml version="1.0"?> <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN" "http://www.tei-c.org/P4X/DTD/tei2.dtd" [ <!ENTITY % TEI.general 'INCLUDE'> <!ENTITY % TEI.names.dates 'INCLUDE'> <!ENTITY % TEI.linking 'INCLUDE'> <!ENTITY % TEI.XML 'INCLUDE'> <!ENTITY bibliography SYSTEM "refdbtest.bib.xml"> ]> ... |
The name of the entity is of course yours to choose, but using "bibliography" as in this example is pretty descriptive.
Include the bibliography at the desired spot:
... &bibliography; ... |
You need to make sure that the included chunk of text is valid at the point where you want to include it. DocBook SGML and XML bibliographies are generated as bibliography elements, TEI XML bibliographies are wrapped in div elements.
Creating citations and bibliographies in SGML or XML documents with refdb is very similar to what you would do if you had to manually code the bibliographies - but without the sweat. First you create the citations. Each citation consists of one or more bibliographic references in the text, each of which points to one particular entry in the bibliography. Then you create a bibliography for all cited publications (and possibly some more). For an increased benefit you would certainly also want to create functional links from the citations to the corresponding bibliography entries, which would act as hyperlinks in suitable output formats like HTML or PDF. In real life, you would probably jump back and forth, adding a bibliography entry whenever you add a new citation, and invent suitable ID values for your bibliographic link targets as needed.
Note: The distinction made here between a citation and a bibliographical reference may sound like nitpicking, but it will be important when we deal with citations that contain more than one bibliographical reference.
refdb requires a slightly more formalized approach. You have to stick to a particular syntax when you create the citations, but the good news is that refdb does almost all of the rest. You will usually also create the citations first and let refdb create the bibliography just before you are ready to transform the first draft.
The particular syntax of citations and bibliographic references is necessary for two reasons: first we have to tell refdb which bibliographic database entry (and probably, from which database) we want to reference. Second, we need to encode which type of citation or reference we want. The exact markup depends on the DTD that your document uses, but the basics are the same.
In both DocBook and TEI documents, these two bits of information are encoded in attributes of elements that create a link from the reference to the bibliographic entry. In order to handle multiple citations correctly, these link elements need to be inside a wrapper element. For a DocBook document, basic citations therefore look like this:
<citation role="REFDB"> <xref linkend="ID1X"> </citation> <citation role="REFDB"> <xref linkend="LITIBP-ID2X"> </citation> |
Note: This and the following DocBook examples are given in SGML notation. Keep in mind two things when working with XML documents:
The empty xref elements need a closing slash as in <xref linkend="ID2X"/>.
All attribute values relevant to refdb must be in uppercase. This restriction is imposed by the way citations are currently extracted from the document. It may be dropped in later versions though.
The corresponding syntax in a TEI XML document looks like this:
Note: You don't have to worry about the attributes in the example which are not mentioned in the explanations. These are TEI default attributes which do not have anything to do with refdb (your XML editor will most likely create them automatically for you).
There are several ways to render citations and bibliographic references in the text. You select what you need by a trailing capital letter after the database ID (the "X" in the above examples). refdb will create several preformatted strings in the bibliography file which can be linked to by selecting the proper postfix. These preformatted strings have several purposes, as shown in the following table:
Table 9-1. Bibliographic reference types
Postfix | Purpose |
---|---|
X | The most common case. This is the first occurrence of a reference which is to be displayed outside the flow of the text. In numerical citation schemes this will be something like "(2)", in author-year citation schemes this may be rendered as "(Miller et al., 1992)". |
S | This is the same as X, but for a subsequent occurrence of the same reference. This distinction is important for some author-year citation schemes that print the full (or at least a longer) author list at the first occurrence and an abbreviated one at all subsequent occurrences of the same reference. |
A | This is the first occurrence of a reference that displays the authorlist inside the flow of the text, like in "Miller et al. reported recently (2001)...". |
Q | This is the same as A, but for subsequent occurrences of the same reference. |
Y | This type complements the author-only references mentioned above. In numerical citation schemes this is usually rendered like a normal reference, e.g. as "(2)", but in author-year citation schemes usually only the publication date is rendered, as in "(2001)". |
Note: The exact formatting of these references, e.g. which citation style is used or which brackets surround the reference, is controlled by the style specification for a particular publication or publisher. This takes effect when you generate the bibliography and transform the final document.
An additional twist comes into play if you have multiple citations, i.e. a citation that contains more than one bibliographic reference. In most cases, all references are displayed inside of one pair of brackets. Some numerical citation styles require that bibliographic references with consecutive numbers be formatted as ranges within the same citation.
Note: Formatting consecutive numbers as ranges kills the links from the reference to the bibliographic item for each reference that make up a range. Any generated hyperlinks will therefore point to one common target for all members of a multiple citation. If this is not desired (e.g. to keep the links alive in a HTML presenation of a scientific document), you may override this behaviour during the transformation of the final document.
In order to format these cases properly, you need to include a dummy element whose sole purpose is to provide a link to an element that contains the combined, preformatted citation string. This is shown for a DocBook document in the following example.
<citation role="REFDB"> <xref endterm="THEFIRST" linkend="ID1" role="MULTIXREF"> <xref linkend="ID1X"> <xref linkend="ID14X"> <xref linkend="ID7X"> </citation> |
Note: The sequence of the xref elements that encode the actual references may be important. Depending on the bibliography style used for the document transformation, the references may be displayed in the sequence as they were entered, or they may be rearranged according to the sequence of the bibliographic entries in the finished bibliography.
Keep also in mind that all attribute values must be in uppercase for the same reasons as stated above.
The corresponding TEI citation is a little bit simpler:
Unless you have good reasons not to do so, you should use the runbib shell script to generate the bibliography. This script greatly simplifies this task and offers a common interface for all supported document types. The following subsection will explain the use of this script. If you like to do it the hard way (or if you want to peek under the hood) you'll find a few explanations further down how to do this.
Lets assume you have a DocBook SGML document mypaper.sgml and want to submit it to the "Journal of Irreproducible Results". We further assume that the bibliography style for this famous periodical is stored in your database under the name "J.Irrep.Res." (see Manage bibliography styles to learn how it gets there). All your bibliography entries (at least those referenced without an explicit database name) are stored in the database mybib. Start the script from the directory that contains your document with the following command:
~$ runbib -d mybib -S "J.Irrep.Res." -t db31 foo.sgml |
For a similar TEI XML document bar.xml you would run:
~$ runbib -d mybib -S "J.Irrep.Res." -t teix bar.xml |
In both cases you will end up with a bibliography file (foo.bib.sgml and bar.bib.xml, respectively) as well as with a stylesheet (J.Irrep.Res.dsl) or a set of stylesheets (J.Irrep.Res.fo.xsl and J.Irrep.Res.html.xsl), respectively.
Note: Don't worry if you are greeted by a list of (Open)Jade errors complaining about missing elements when you first run this script on a particular document. Your document contains a number of crosslinks that point to elements that do not exist yet - you use runbib precisely to create these elements (you thus face a classic bootstrapping problem). As soon as the bibliography is created, these error messages should go away. Later you will only get an error message for each bibliographic entry that was added since the last time you ran runbib.
The following steps do exactly what the runbib script does, just with more to type. The only benefit of the hard way is that you have a chance to fiddle with the intermediate XML file which contains the list of bibliographic entries that should go into the bibliography. You can add further entries to extend the bibliography if you want to include uncited publications. The following procedure was written with a DocBook SGML document in mind, but transferring the commands to XML documents is straightforward. However, when working with XML documents there are additional steps required as outlined below.
Extract the list of bibliographic references
Use Jade or OpenJade with the citations.dsl stylesheet to create a list of the reference IDs (provide full paths as needed):
#~ jade -t sgml -d citations.dsl /usr/lib/sgml/declaration/docbook-3.1.dcl foo.sgml > foo.id.xml |
Be prepared for a long list of "missing ID" error messages. This is due to the fact that the elements with the IDs that the xref elements in the citations point to do not yet exist, they will be generated in the refdb bibliography output. If you process documents with more than 200 citations, you'll have to increase the maximum error limit of Jade in order to obtain all IDs the first time. After the first complete pass (including the steps outlined below), Jade will only complain about any additional citations that you have inserted since the last run.
The output is a simple XML file that contains the information about all citation and xref elements with their relevant attributes. It is absolutely legal to extend this file with additional citation elements to specify references which are not cited but nonetheless should appear in the bibliography.
Unfortunately, both Jade and OpenJade don't get that Doctype line quite correct. Both forget to insert a space between the public and the system identifier, thus leaving you with a not well-formed document. Fire up your favourite editor and fix this line manually (insert a space between the two consecutive quotation marks on line 2).
If you edit this intermediate XML file (that is, if you do more than just fixing the Doctype line), you should make sure that the result is still valid according to the CitationList XML DTD. refdb uses a non-validating parser to read this file so deviations from the DTD may slip through undetected and may have undesired consequences. The intermediate XML file carries the SYSTEM identifier of the CitationList XML DTD in the document type declaration. You may have to adapt the stylesheet citations.dsl to use the correct path for your local system.
The following command line can be used to validate the document with (o)nsgmls (change the path to the XML declaration as necessary):
~$ nsgmls -wxml -s /usr/lib/sgml/declaration/xml.dcl foo.id.xml |
Create the bibliography file
~$ refdbib -d mybib -S "J.Irrep.Res." foo.id.xml > foo.bib.sgml |
This assumes that your reference database is called "mybib" and that you try to publish your paper in a journal that accepts the style with the name "J.Irrep.Res.".
In addition to the bibliography file, refdbib will also create a DSSSL script containing the style specification. This file is a customized driver file for the RefDB-DocBook driver files and provides a couple of variable values specific for the given bibliography style.
Post-processing
This step is only required for XML documents. First we have to bring the stylesheets into shape, and if it is a TEI document, we'll also have to transform the bibliography file itself.
refdbib creates a general-purpose XSL stylesheet which we need to turn into one FO and one HTML stylesheet. Create two copies of the file. If the stylesheet was e.g. J.Biol.Chem.xsl, you need one copy named J.Biol.Chem.fo.xsl and one copy named J.Biol.Chem.html.xsl. Scan the files for an import statement whose href attribute is surrounded with two "<!-- REFDBSTYLESHEET -->" comments. The value of this attribute must be set to the full path of the corresponding original stylesheet (DocBook FO or HTML, or TEI FO or HTML).
If you're working on a TEI XML document, you'll have to transform the bibliography file itself. This is a DocBook SGML document and can be transformed easily with Jade/OpenJade and the bibdb2tei.dsl stylesheet.
Finally you can transform the document to create printable or HTML output. In order to get the formatting of the citations and bibliography entries right you have to use the refdb driver files for the DocBook or TEI stylesheets.
In addition to the general modifications of these driver files we'll have to apply modifications specific for the particular reference style. Therefore you have to specify the DSSSL or XSL style specification file that was created in the previous step. For your convenience it is recommended to use the supplied refdbjade and refdbxml scripts for DSSSL and XSL transformations, respectively, which were designed for this task:
~$ refdbjade -t html -s J.Irrep.Res.dsl foo.sgml |
~$ refdbxml -t pdf -s J.Irrep.Res.fo.xsl bar.xml |
If you want to change the bibliography style of your document, all you need to do is to rerun runbib and refdbjade or refdbxml with the new parameters. No changes to your DocBook source are necessary.
Note: If you want to create a bibliography for each part of a book or for each chapter, the procedure is not much different. The simplest approach is to keep the parts or chapters in individual files and process these individually as described above for the whole document. You'll get several bibliography files that you can include into the corresponding document source files.
While refdb works out of the box with DocBook SGML/XML and TEI XML documents, it is by no means limited to these document types. The only native bibliography export format is DocBook (in a form suitable for both SGML and XML documents). TEI bibliographies are actually generated from this output with a SGML-to-XML transformation. The DocBook output has a sufficient granularity to allow this and possibly other transformations. If you want refdb to work with other document types, you have to do the following:
Either extend citations.dsl or create an additional stylesheet suitable to extract a list of citations and references conforming to the CitationList XML DTD.
Provide a DSSSL stylesheet to transform the DocBook SGML bibliography output to your target document type.
Modify your DSSSL or XSL stylesheets (or better, provide suitable driver files) to make use of the extended formatting information both in the bibliography and in the refdb-created driver files.
refdb integrates quite nicely with the LaTeX/BibTeX system. If you previously used a flat text file to store your BibTeX references, you will notice that there is only one additional command to run when you process your source document. Instead of keeping all of your references in a text file, refdbib will retrieve only the required references from the SQL database and store them in an intermediate text file.
Prepare the document
Use the LaTeX commands cite and nocite to include the references as usual. The extended commands from the natbib package should work as well. All these commands take an identifier for the reference as an argument. These reference definitions can come in two flavours just like in DocBook documents: Either you use the same database for all references in the text. Then you just specify the ID of the reference and tell the processing application which database to use. Or you specify the database name with each reference. In this case, you can pull the references from different databases in the same document. The two versions look like this:
\cite{ID1} \cite{litibp-ID2} |
The first version cites the reference with the identifier "1" in the database passed to the processing application as an argument. The second form cites the reference with the identifier "2" in the database "litibp".
The LaTeX \bibliography command takes as an argument the name of the intermediate bibliography file without the extension. A simple choice would be the basename of your LaTeX document.
Note: Keep in mind that even if you pull references from different refdb databases, you still need to specify only one reference database in your LaTeX document as refdb consolidates all cited references into one bibliography file.
Create the auxiliary file
Run the latex interpreter with the basename of your document (foo.tex) as an argument:
#~ latex foo |
latex will create, among other files, foo.aux. latex stores all sorts of information in these auxiliary files for later use in subsequent runs. The interesting part for us is the list of citations.
Create the intermediate bibliography file
Now refdb enters the stage. We process the auxiliary file to create a BibTeX bibliography tailored to our document. Either we do it manually:
#~ refdbib -d mybib -S name -t bibtex foo.aux > foo.bib |
Or we use the runbib shell script:
#~ runbib -d mybib -S name -t bibtex foo |
Remember that the basename of the file that receives the bibliographic information (foo.bib in our example) must match the name given in the bibliography command in the LaTeX document.
The resulting bibliography file will contain all references that were requested from the LaTeX document. If you add more citations to this document, you have to run refdbib again to update the intermediate bibliography file (it won't hurt if you remove citations from your LaTeX document, though).
Note: For the sake of consistency with bibtex, it is possible (though not necessary) to specify the auxiliary file without the .aux extension (foo in the above example).
Run bibtex
From here, everything runs as you are used to from LaTeX/BibTeX:
#~ bibtex foo |
Run latex
Run latex on your LaTeX document at least twice to get all references right:
#~ latex foo && latex foo |