This section explains how to prepare documents for use with refdb and how to generate and process the bibliographies. First we'll look at SGML and XML documents, then at LaTeX documents.
DocBook and TEI SGML and XML documents and their refdb bibliographies share many features, so they are treated together in this section. We'll cover how you specify citations, how you generate the bibliography, and how you transform the final document.
The output of the refdbib application will be a bibliography element that contains all required references. You can redirect the output into a file and include this file as an external entity at the spot where your bibliography should appear. To achieve this you need two modifications in your document:
Extend the document type declaration at the beginning of your document to declare the external entity. The first example is from a DocBook SGML document:
<!DOCTYPE BOOK PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ <!ENTITY bibliography "foo.bib.sgml"> ]> ... |
The second example shows a TEI XML document:
<?xml version="1.0"?> <!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN" "http://www.tei-c.org/P4X/DTD/tei2.dtd" [ <!ENTITY % TEI.general 'INCLUDE'> <!ENTITY % TEI.names.dates 'INCLUDE'> <!ENTITY % TEI.linking 'INCLUDE'> <!ENTITY % TEI.XML 'INCLUDE'> <!ENTITY bibliography SYSTEM "refdbtest.bib.xml"> ]> ... |
The name of the entity is of course yours to choose, but using "bibliography" as in this example is pretty descriptive.
Include the bibliography at the desired spot:
... &bibliography; ... |
You need to make sure that the included chunk of text is valid at the point where you want to include it. DocBook SGML and XML bibliographies are generated as bibliography elements, TEI XML bibliographies are wrapped in div elements.
Creating citations and bibliographies in SGML or XML documents with refdb is very similar to what you would do if you had to manually code the bibliographies - but without the sweat. First you create the citations. Each citation consists of one or more bibliographic references in the text, each of which points to one particular entry in the bibliography. Then you create a bibliography for all cited publications (and possibly some more). For an increased benefit you would certainly also want to create functional links from the citations to the corresponding bibliography entries, which would act as hyperlinks in suitable output formats like HTML or PDF. In real life, you would probably jump back and forth, adding a bibliography entry whenever you add a new citation, and invent suitable ID values for your bibliographic link targets as needed.
Note: The distinction made here between a citation and a bibliographical reference may sound like nitpicking, but it will be important when we deal with citations that contain more than one bibliographical reference.
refdb requires a slightly more formalized approach. You have to stick to a particular syntax when you create the citations, but the good news is that refdb does almost all of the rest. You will usually also create the citations first and let refdb create the bibliography just before you are ready to transform the first draft.
refdb allows two different notations for references:
The short notation is, as the name implies, a lot faster to type and thus more convenient, but it requires an additional preprocessing step that adds some small restrictions to the way you write your documents (please see the section about refdbxp for details about these restrictions). The preprocessing of documents using the short notation also automates the issue of first and subsequent citations of a bibliographic entry and it automatically creates the ID values used in multiple citations. Using multiple databases per document is not supported by the short notation currently.
The short notation is fully valid SGML or XML code, without any extensions of the original DTDs. You can use all sorts of SGML or XML processing tools on such documents.
The full notation offers full control but requires a lot more typing and thinking. It does not require a preprocessing step before the transformation, though. You need to take care of the issue of first and subsequent citations of a reference, and you have to manually generate ID values for use in multiple citations. You can include references taken from several databases.
Just like the short notation, the full notation is also fully valid SGML or XML code, without any extensions of the original DTDs.
First we'll have a look at the short notation, before we get into the gruesome details of the full notation. Keep in mind that the refdbxp application interconverts the short and the full notation. You can convert your document back and forth as often as you wish, so you're not limited to the notation that you initially choose. In fact, you can mix both notations in a single document.
In DocBook documents (both SGML and XML), citations are encoded as citation elements. To distinguish these from citation elements that are not meant to be processed by refdb, set the role attribute to REFDB in all caps. Each citation element contains one or more references, separated by semicolons. The trailing semicolon after the last reference is optional, so the following citations are absolutely equivalent:
<citation role="REFDB">2;5;9</citation> <citation role="REFDB">2;5;9;</citation> |
The list of references can either contain numerical ID values, as in the examples above, or alphanumeric citation keys like in the following example:
<citation role="REFDB">miller1999;jones2001</citation> |
The corresponding syntax for TEI XML documents is quite similar, except that we abuse the general-purpose seg element and tag it for use with refdb by setting the type to REFDBCITATION in all caps:
<seg type="REFDBCITATION" part="N" TEIform="seg">2;5;9</seg> |
Again, you can use citation keys instead of the ID values shown in the example above.
The examples shown above will be rendered as "regular" citations. In addition to this you can request author-only or year-only citations. These come in handy if you want to write something like: Jones et al. reported recently (2001)... Both the authors (Jones et al.) and the year (2001) need to be encoded as individual citations as shown in the following example:
<para><citation role="REFDB">A:jones2001</citation> reported recently <citation role="REFDB">Y:jones2001</citation> ...</para> |
You may have guessed that the prefix "A:" tags a citation as an author-only citation and that the prefix "Y:" means year-only.
Note: These prefixes tag the whole citation, not a particular reference in the citation. Therefore the prefix must be the first thing right after the start tag. Multiple citations using the author-only or year-only style would make no sense anyway.
This is about all you need to know about the short notation. The only thing you must not forget is that you must preprocess documents that contain citations in short notation with refdbxp before you transform the document to one of the output formats.
The full notation is a lot more complex than the simple notation described above. So unless you have specific reasons to write citations in full notation from scratch, it is more advisable to use the short notation and preprocess your documents with refdbxp. The output created by this utility is the full notation described in this section.
The particular syntax of citations and bibliographic references is necessary for two reasons: first we have to tell refdb which bibliographic database entry (and probably, from which database) we want to reference. Second, we need to encode which type of citation or reference we want. The exact markup depends on the DTD that your document uses, but the basics are the same.
In both DocBook and TEI documents, these two bits of information are encoded in attributes of elements that create a link from the reference to the bibliographic entry. In order to handle multiple citations correctly, these link elements need to be inside a wrapper element. For a DocBook document, basic citations therefore look like this:
<citation role="REFDB"> <xref linkend="ID1X"> </citation> <citation role="REFDB"> <xref linkend="LITIBP-ID2X"> </citation> |
Note: This and the following DocBook examples are given in SGML notation. Keep in mind two things when working with XML documents:
The empty xref elements need a closing slash as in <xref linkend="ID2X"/>.
All attribute values relevant to refdb must be in uppercase. This restriction is imposed by the way citations are currently extracted from the document. It may be dropped in later versions though.
The corresponding syntax in a TEI XML document looks like this:
Note: You don't have to worry about the attributes in the example which are not mentioned in the explanations. These are TEI default attributes which do not have anything to do with refdb (your XML editor will most likely create them automatically for you).
There are several ways to render citations and bibliographic references in the text. You select what you need by a trailing capital letter after the database ID (the "X" in the above examples). refdb will create several preformatted strings in the bibliography file which can be linked to by selecting the proper postfix. These preformatted strings have several purposes, as shown in the following table:
Table 10-1. Bibliographic reference types
Postfix | Purpose |
---|---|
X | The most common case. This is the first occurrence of a reference which is to be displayed outside the flow of the text. In numerical citation schemes this will be something like "(2)", in author-year citation schemes this may be rendered as "(Miller et al., 1992)". |
S | This is the same as X, but for a subsequent occurrence of the same reference. This distinction is important for some author-year citation schemes that print the full (or at least a longer) author list at the first occurrence and an abbreviated one at all subsequent occurrences of the same reference. |
A | This is the first occurrence of a reference that displays the authorlist inside the flow of the text, like in "Miller et al. reported recently (2001)...". |
Q | This is the same as A, but for subsequent occurrences of the same reference. |
Y | This type complements the author-only references mentioned above. In numerical citation schemes this is usually rendered like a normal reference, e.g. as "(2)", but in author-year citation schemes usually only the publication date is rendered, as in "(2001)". |
Note: The exact formatting of these references, e.g. which citation style is used or which brackets surround the reference, is controlled by the style specification for a particular publication or publisher. This takes effect when you generate the bibliography and transform the final document.
An additional twist comes into play if you have multiple citations, i.e. a citation that contains more than one bibliographic reference. In most cases, all references are displayed inside of one pair of brackets. Some numerical citation styles require that bibliographic references with consecutive numbers be formatted as ranges within the same citation.
Note: Formatting consecutive numbers as ranges kills the links from the reference to the bibliographic item for each reference that make up a range. Any generated hyperlinks will therefore point to one common target for all members of a multiple citation. If this is not desired (e.g. to keep the links alive in a HTML presenation of a scientific document), you may override this behaviour during the transformation of the final document.
In order to format these cases properly, you need to include a dummy element whose sole purpose is to provide a link to an element that contains the combined, preformatted citation string. This is shown for a DocBook document in the following example.
<citation role="REFDB"> <xref endterm="IMTHEFIRST" linkend="ID1" role="MULTIXREF"> <xref linkend="ID1X"> <xref linkend="ID14X"> <xref linkend="ID7X"> </citation> |
Note: The sequence of the xref elements that encode the actual references may be important. Depending on the bibliography style used for the document transformation, the references may be displayed in the sequence as they were entered, or they may be rearranged according to the sequence of the bibliographic entries in the finished bibliography.
Keep also in mind that all attribute values must be in uppercase for the same reasons as stated above.
The corresponding TEI citation is a little bit simpler:
Unless you have good reasons not to do so, you should use the runbib shell script to generate the bibliography. This script greatly simplifies this task and offers a common interface for all supported document types. The following subsection will explain the use of this script. If you like to do it the hard way (or if you want to peek under the hood) you'll find a few explanations further down how to do this.
Lets assume you have a DocBook SGML document mypaper.sgml and want to submit it to the "Journal of Irreproducible Results". We further assume that the bibliography style for this famous periodical is stored in your database under the name "J.Irrep.Res." (see Manage bibliography styles to learn how it gets there). All your bibliography entries (at least those referenced without an explicit database name) are stored in the database mybib. Start the script from the directory that contains your document with the following command:
~$ runbib -d mybib -S "J.Irrep.Res." -t db31 foo.sgml |
For a similar TEI XML document bar.xml you would run:
~$ runbib -d mybib -S "J.Irrep.Res." -t teix bar.xml |
In both cases you will end up with a bibliography file (foo.bib.sgml and bar.bib.xml, respectively) as well as with a stylesheet (J.Irrep.Res.dsl) or a set of stylesheets (J.Irrep.Res.fo.xsl and J.Irrep.Res.html.xsl), respectively.
Note: Don't worry if you are greeted by a list of (Open)Jade errors complaining about missing elements when you first run this script on a particular document. Your document contains a number of crosslinks that point to elements that do not exist yet - you use runbib precisely to create these elements (you thus face a classic bootstrapping problem). As soon as the bibliography is created, these error messages should go away. Later you will only get an error message for each bibliographic entry that was added since the last time you ran runbib.
The following steps do exactly what the runbib script does, just with more to type. The only benefit of the hard way is that you have a chance to fiddle with the intermediate XML file which contains the list of bibliographic entries that should go into the bibliography. You can add further entries to extend the bibliography if you want to include uncited publications. The following procedure was written with a DocBook SGML document in mind, but transferring the commands to XML documents is straightforward. However, when working with XML documents there are additional steps required as outlined below.
Extract the list of bibliographic references
Use Jade or OpenJade with the citations.dsl stylesheet to create a list of the reference IDs (provide full paths as needed):
#~ jade -t sgml -d citations.dsl /usr/lib/sgml/declaration/docbook-3.1.dcl foo.sgml > foo.id.xml |
Be prepared for a long list of "missing ID" error messages. This is due to the fact that the elements with the IDs that the xref elements in the citations point to do not yet exist, they will be generated in the refdb bibliography output. If you process documents with more than 200 citations, you'll have to increase the maximum error limit of Jade in order to obtain all IDs the first time. After the first complete pass (including the steps outlined below), Jade will only complain about any additional citations that you have inserted since the last run.
The output is a simple XML file that contains the information about all citation and xref elements with their relevant attributes. It is absolutely legal to extend this file with additional citation elements to specify references which are not cited but nonetheless should appear in the bibliography.
Unfortunately, both Jade and OpenJade don't get that Doctype line quite correct. Both forget to insert a space between the public and the system identifier, thus leaving you with a not well-formed document. Fire up your favourite editor and fix this line manually (insert a space between the two consecutive quotation marks on line 2).
If you edit this intermediate XML file (that is, if you do more than just fixing the Doctype line), you should make sure that the result is still valid according to the CitationList XML DTD. refdb uses a non-validating parser to read this file so deviations from the DTD may slip through undetected and may have undesired consequences. The intermediate XML file carries the SYSTEM identifier of the CitationList XML DTD in the document type declaration. You may have to adapt the stylesheet citations.dsl to use the correct path for your local system.
The following command line can be used to validate the document with (o)nsgmls (change the path to the XML declaration as necessary):
~$ nsgmls -wxml -s /usr/lib/sgml/declaration/xml.dcl foo.id.xml |
Create the bibliography file
~$ refdbib -d mybib -S "J.Irrep.Res." foo.id.xml > foo.bib.sgml |
This assumes that your reference database is called "mybib" and that you try to publish your paper in a journal that accepts the style with the name "J.Irrep.Res.".
In addition to the bibliography file, refdbib will also create a DSSSL script containing the style specification. This file is a customized driver file for the RefDB-DocBook driver files and provides a couple of variable values specific for the given bibliography style.
Post-processing
This step is only required for XML documents. First we have to bring the stylesheets into shape, and if it is a TEI document, we'll also have to transform the bibliography file itself.
refdbib creates a general-purpose XSL stylesheet which we need to turn into one FO and one HTML stylesheet. Create two copies of the file. If the stylesheet was e.g. J.Biol.Chem.xsl, you need one copy named J.Biol.Chem.fo.xsl and one copy named J.Biol.Chem.html.xsl. Scan the files for an import statement whose href attribute is surrounded with two "<!-- REFDBSTYLESHEET -->" comments. The value of this attribute must be set to the full path of the corresponding original stylesheet (DocBook FO or HTML, or TEI FO or HTML).
If you're working on a TEI XML document, you'll have to transform the bibliography file itself. This is a DocBook SGML document and can be transformed easily with Jade/OpenJade and the bibdb2tei.dsl stylesheet.
Finally you can transform the document to create printable or HTML output. In order to get the formatting of the citations and bibliography entries right you have to use the refdb driver files for the DocBook or TEI stylesheets.
In addition to the general modifications of these driver files we'll have to apply modifications specific for the particular reference style. Therefore you have to specify the DSSSL or XSL style specification file that was created in the previous step. For your convenience it is recommended to use the supplied refdbjade and refdbxml scripts for DSSSL and XSL transformations, respectively, which were designed for this task:
~$ refdbjade -t html -s J.Irrep.Res.dsl foo.sgml |
~$ refdbxml -t pdf -s J.Irrep.Res.fo.xsl bar.xml |
If you want to change the bibliography style of your document, all you need to do is to rerun runbib and refdbjade or refdbxml with the new parameters. No changes to your DocBook source are necessary.
Note: If you want to create a bibliography for each part of a book or for each chapter, the procedure is not much different. The simplest approach is to keep the parts or chapters in individual files and process these individually as described above for the whole document. You'll get several bibliography files that you can include into the corresponding document source files.
Now that you know all necessary steps to process SGML and XML documents, it's about time to reveal that there is a simple shortcut if you can live with some minor restrictions. The refdbnd script helps you to start new SGML or XML projects and sets up a Makefile to process your document.
Start the script in a clean subdirectory by typing refdbnd. The script will start in interactive mode and ask a couple of questions. You'll have to specify the basename of your project, the SGML or XML document type declaration you'd like to use, the top-level element, the refdb database that holds the references which you intend to cite, and the name of the bibliography style to be used with this document. The script will then create a file <basename>.short.[sgml|xml]. The ".short" reminds you that the Makefile assumes you will be using the short notation for citations. It will also create a Makefile which is set up to perform the necessary steps to create all sorts of available formatted output.
Once you have written your document, including a few citations and a reference to the external bibliography file as explained in the previous sections, you can use the Makefile to process your document. You may know how to use Makefiles anyway, but if not, here are the main properties:
A Makefile is an input file for the program make. If your present working directory contains a file called Makefile and you just run make, the program will process that file.
Makefiles define one or more targets. A target defines which kind of output you want to create. In order to run a specific target you pass this target as an argument to make.
Makefiles define dependencies. Unlike simple shell scripts that usually run a complex series of comands from start to end, make checks for each target individually whether one of the files or targets it depends on is outdated. This way, only the minimum number of required processing steps will run in order to create the desired output.
The Makefiles created by refdbnd offer the following targets:
This target generates a PDF file from your source document. PDF is a widely accepted document format with free viewers for essentially all current operating systems.
This runs all required commands to create HTML output, viewable with any web browser. Depending on your local setup, the output will be chunked into a collection of HTML files.
This target generates a Rich Text Format (RTF) file. This plain text format is sort of a word processor interchange format understood by most current word processors, including MS Word, WordPerfect, and OpenOffice/StarOffice.
This target is only available for SGML documents. It will create a Postscript document from your source. Postscript is the universal document format on Unix systems and can be printed directly on Postscript printers. Viewers are available for all current operating systems.
The Makefile also offers a few more targets. For each of the above targets there is a corresponding '<target>dist' target which creates a .tar.gz archive of the output document. The target 'all', which is also the default if you don't specify a target to make, builds all available output formats. Accordingly, the target 'dist' creates all archives. And finally, the target 'clean' removes all intermediate files and returns your directory to the original state.
For example, to create a formatted PDF document <basename>.pdf from your <basename>.short.sgml file you'd type make pdf. make will first convert the short-style citations to the full style using the refdbxp tool. Then it will generate the bibliography and the stylesheet driver files by running refdbib. Finally it will run the refdbjade script to create the PDF output.
The refdbnd-generated Makefiles should be sufficient for the average document. However, feel free to modify them in order to adapt them to specific needs. For example you can specify a different style in order to switch your output to a different citation and bibliography style. make also allows you to override variable settings on the command line. E.g. if you want to output your document using a different bibliography style without making it the permanent default, invoke make like this:
~$ make pdf stylename="Eur.J.Pharmacol." |
Note: You'll have to remove all intermediate files by running make clean first before you can switch to a different bibliography style.
While refdb works out of the box with DocBook SGML/XML and TEI XML documents, it is by no means limited to these document types. The only native bibliography export format is DocBook (in a form suitable for both SGML and XML documents). TEI bibliographies are actually generated from this output with a SGML-to-XML transformation. The DocBook output has a sufficient granularity to allow this and possibly other transformations. If you want refdb to work with other document types, you have to do the following:
Either extend citations.dsl or create an additional stylesheet suitable to extract a list of citations and references conforming to the CitationList XML DTD.
Provide a DSSSL stylesheet to transform the DocBook SGML bibliography output to your target document type.
Modify your DSSSL or XSL stylesheets (or better, provide suitable driver files) to make use of the extended formatting information both in the bibliography and in the refdb-created driver files.
refdb integrates quite nicely with the LaTeX/BibTeX system. If you previously used a flat text file to store your BibTeX references, you will notice that there is only one additional command to run when you process your source document. Instead of keeping all of your references in a text file, refdbib will retrieve only the required references from the SQL database and store them in an intermediate text file.
Prepare the document
Use the LaTeX commands cite and nocite to include the references as usual. The extended commands from the natbib package should work as well. All these commands take an identifier for the reference as an argument. These reference definitions can come in two flavours just like in DocBook documents: Either you use the same database for all references in the text. Then you just specify the ID of the reference and tell the processing application which database to use. Or you specify the database name with each reference. In this case, you can pull the references from different databases in the same document. The two versions look like this:
\cite{IDMiller1999} \cite{litibp-IDMyers2001} |
The first version cites the reference with the identifier "1" in the database passed to the processing application as an argument. The second form cites the reference with the identifier "2" in the database "litibp".
The LaTeX \bibliography command takes as an argument the name of the intermediate bibliography file without the extension. A simple choice would be the basename of your LaTeX document.
Note: Keep in mind that even if you pull references from different refdb databases, you still need to specify only one reference database in your LaTeX document as refdb consolidates all cited references into one bibliography file.
Create the auxiliary file
Run the latex interpreter with the basename of your document (foo.tex) as an argument:
#~ latex foo |
latex will create, among other files, foo.aux. latex stores all sorts of information in these auxiliary files for later use in subsequent runs. The interesting part for us is the list of citations.
Create the intermediate bibliography file
Now refdb enters the stage. We process the auxiliary file to create a BibTeX bibliography tailored to our document. Either we do it manually:
#~ refdbib -d mybib -S name -t bibtex foo.aux > foo.bib |
Or we use the runbib shell script:
#~ runbib -d mybib -S name -t bibtex foo |
Remember that the basename of the file that receives the bibliographic information (foo.bib in our example) must match the name given in the bibliography command in the LaTeX document.
The resulting bibliography file will contain all references that were requested from the LaTeX document. If you add more citations to this document, you have to run refdbib again to update the intermediate bibliography file (it won't hurt if you remove citations from your LaTeX document, though).
Note: For the sake of consistency with bibtex, it is possible (though not necessary) to specify the auxiliary file without the .aux extension (foo in the above example).
Run bibtex
From here, everything runs as you are used to from LaTeX/BibTeX:
#~ bibtex foo |
Run latex
Run latex on your LaTeX document at least twice to get all references right:
#~ latex foo && latex foo |