bib2ris

bib2ris converts BibTeX bibliography files into RIS files. The filter is implemented using Greg Ward's outstanding btparse library. bib2ris can also run as a CGI app to provide BibTeX-to-RIS data conversion in the refdb web interface.

Unfortunately the concepts underlying BibTeX and RIS bibliographic data are quite different so that BibTeX data do not readily lend themselves to a clean conversion to the RIS format. This is not meant as an excuse to provide a bad filter but you should be aware that a few compile-time assumptions have to be made in order to get reasonable results. In any case, as the data models differ considerably, a loss-free interconversion between the two data types is not possible: If you convert a BibTeX bibliography file to RIS and then back, the result will differ considerably from your input.

There are basically two ways how to handle BibTeX data with refdb:

  1. Convert all entries to plain text. This will allow you to work with your data just as with "native" RIS data, i.e. all field values in the output of the refdb backends will be plain text as well.

  2. Keep the TeX formatting in the entries. This will allow you to make use of TeX commands and formatting stuff in the BibTeX bibliography output, but it'll be a bit strange to work with these data in the rest of refdb. When formulating queries you will have to take account of the TeX magic, and this stuff will also show up in all other output (screen, HTML, DocBook etc).

There may be better support for this situation in future releases of refdb. Currently the rule of thumb is: If you're interested only in BibTeX bibliographies, keep the formatting. If you're interested in generating both BibTeX and DocBook bibliographies or if you're mainly interested to maintain an easily accessible reference database, strip off the TeX formatting. This is best done with the supplied tex2mail script which will be discussed shortly.

That said, you may still be interested to see how it works.

Starting bib2ris

Start bib2ris with the command:

bib2ris [-e log-destination] [-h] [-j] [-l log-level] [-L log-file] [-q] [-s separator] [-v] [-y confdir] [file]

Remember that you don't have to specify all these options each time if you define the values in bib2risrc.

The -e option defines the destination of log output. In order for log output to appear at all, the log level has to be specified correctly with the -l option. A log-destination argument of 0 directs log output to stderr, 1 uses the syslog facility, 2 uses a custom log file. For the latter to work you have to specify a log filename with the -L option.

With the -h option bib2ris displays a brief help screen and exits.

Use the -j option to force bib2ris to use "JO" RIS fields in all cases. If this option is not used, bib2ris tries to infer whether a journal name is an abbreviation or not. If the string contains at least one period, "JO" will be used, otherwise "JF" will be used.

The -l option determines the maximum log level that a log message may have to be logged. If you specify a high level (<=7), all sorts of messages including debug messages are logged. If you specify a low level (>=0), only critical errors are logged. Specify -1 to disable logging.

The -L option specifies a filename which is used as a custom log file if the -e option is set appropriately.

Note: The underlying btparse library sends some warnings and errors directly to stderr. Currently (i.e. without patching btparse) this behaviour cannot be controlled with the -e, -l, and -L switches. If you want to log these messages to a file as well you will have to employ some shell magic to redirect the output.

Use the -q option to temporarily switch off the settings in the init files. bib2ris will then use the compile-time defaults unless you specify things with the command line switches (useful for debugging configuration file settings).

The -s specifies the delimiter which separates individual keywords in a non-standard keyword field. Use the string "spc" for whitespace-separated lists (spaces and tabs).

-v prints the version information and brief licensing information, then exits.

Use the -y to specify the directory where the global configuration files are.

Note: By default, all refdb applications look for their configuration files in a directory that is specified during the configure step when building the package. That is, you don't need the -y option unless you use precompiled binaries in unusual locations, e.g. by relocating a rpm package.

All other command line parameters will be interpreted as input filenames. bib2ris can read the incoming data either from these files or from stdin. If data are available at stdin, the filename arguments will be ignored. The output is always sent to stdout, so you can either view the result by piping into a pager or redirect the data into a file. Of course it is also possible to directly pipe the result into refdbc but it may be prudent to manually check the output before sending something to refdbc that you may later regret.

The exit code of bib2ris indicates what went wrong in general (the details can be found in the log output). The code is the sum of the following error values:

Table 16-5. bib2ris exit codes

code explanation
1 general error; includes out of memory situations and invalid command-line options
2 incomplete entry (at least one essential field in an entry was missing)
4 unknown field name
8 unknown publication type
16 invalid BibTeX->RIS type mapping
32 parse error; includes file access errors

As an example, if bib2ris exits with the error code 18, simple math would tell you that there was at least one error in the BibTeX-to-RIS mapping (16), most likely an invalid RIS tag, and at least one incomplete entry (2), the sum of which yields 18.

Note: Under some (really bad) error conditions the underlying btparse library exits without returning control to bib2ris. In that case the exit code is determined by btparse, not by bib2ris.

The following examples show how bib2ris reads data from stdin or from input files, respectively.

~# bib2ris *.bib | less

This command will convert all .bib files in the current directory and display the result in a pager.

~# bib2ris < foo.bib > foo.ris

This command reads the data via redirection from foo.bib and redirects the output into the file foo.ris.

The bib2ris variables

Depending on how bib2ris is run, it will consult two different configuration files. If it runs as a regular application, the file bib2risrc will be used. If it runs as a CGI application, bib2riscgirc will be used instead. This way you can use different configurions even if the user program and the CGI program run on the same computer.

Table 16-6. bib2risrc

Variable Default Comment
logfile /var/log/bib2ris.log The full path of a custom log file. This is used only if logdest is set appropriately.
logdest 1 The destination of the log information. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile. The latter needs a proper setting of logfile.
loglevel 6 The log level up to which messages will be sent. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages. -1 means nothing will be logged.
abbrevfirst t If this option is set to "t", the first names of all authors and editors will be abbreviated to the initials. If set to "f", the first names will be used as they are found in the BibTeX bibliography file.
listsep ; This is the delimiter which separates individual keywords in a non-standard keyword field. Use the string "spc" for whitespace-separated lists (spaces and tabs).
forcejabbrev f If this is set to "t", journal names will be wrapped in RIS "JO" entries. If it is set to "f", bib2ris will use "JO" entries only if the journal name contains at least one period, otherwise it will use "JF".
maparticle JOUR map the BibTeX article publication type to a RIS type
mapbook BOOK map the BibTeX book publication type to a RIS type
mapbooklet PAMP map the BibTeX booklet publication type to a RIS type
mapconference CHAP map the BibTeX conference publication type to a RIS type
mapinbook CHAP map the BibTeX inbook publication type to a RIS type
mapincollection CHAP map the BibTeX incollection publication type to a RIS type
mapinproceedings CHAP map the BibTeX inproceedings publication type to a RIS type
mapmanual BOOK map the BibTeX manual publication type to a RIS type
mapmastersthesis THES map the BibTeX mastersthesis publication type to a RIS type
mapmisc GEN map the BibTeX misc publication type to a RIS type
mapphdthesis THES map the BibTeX phdthesis publication type to a RIS type
mapproceedings CONF map the BibTeX proceedings publication type to a RIS type
maptechreport RPRT map the BibTeX techreport publication type to a RIS type
mapunpublished UNPB map the BibTeX unpublished publication type to a RIS type
nsf_xyz (none) You can specify an unlimited number of these entries to map non-standard BibTeX fields to RIS tags. The BibTeX field name in this variable has to be in lowercase, regardless of the case in your input data (bib2ris treats field names as case-insensitive). The two-letter RIS tag has to be in uppercase. E.g. to map your BibTeX "Abstract" field to the RIS "N2" tag, the entry would read: "nsf_abstract N2".

Table 16-7. bib2riscgirc

Variable Default Comment
refdblib (none) The path of the directory containing shareable refdb files like DTDs, HTML templates etc.
logfile /var/log/bib2ris.log The full path of a custom log file. This is used only if logdest is set appropriately.
logdest 1 The destination of the log information. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile. The latter needs a proper setting of logfile.
loglevel 6 The log level up to which messages will be sent. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages. -1 means nothing will be logged.
abbrevfirst t If this option is set to "t", the first names of all authors and editors will be abbreviated to the initials. If set to "f", the first names will be used as they are found in the BibTeX bibliography file.
listsep ; This is the delimiter which separates individual keywords in a non-standard keyword field. Use the string "spc" for whitespace-separated lists (spaces and tabs).
forcejabbrev f If this is set to "t", journal names will be wrapped in RIS "JO" entries. If it is set to "f", bib2ris will use "JO" entries only if the journal name contains at least one period, otherwise it will use "JF".
maparticle JOUR map the BibTeX article publication type to a RIS type
mapbook BOOK map the BibTeX book publication type to a RIS type
mapbooklet PAMP map the BibTeX booklet publication type to a RIS type
mapconference CHAP map the BibTeX conference publication type to a RIS type
mapinbook CHAP map the BibTeX inbook publication type to a RIS type
mapincollection CHAP map the BibTeX incollection publication type to a RIS type
mapinproceedings CHAP map the BibTeX inproceedings publication type to a RIS type
mapmanual BOOK map the BibTeX manual publication type to a RIS type
mapmastersthesis THES map the BibTeX mastersthesis publication type to a RIS type
mapmisc GEN map the BibTeX misc publication type to a RIS type
mapphdthesis THES map the BibTeX phdthesis publication type to a RIS type
mapproceedings CONF map the BibTeX proceedings publication type to a RIS type
maptechreport RPRT map the BibTeX techreport publication type to a RIS type
mapunpublished UNPB map the BibTeX unpublished publication type to a RIS type
nsf_xyz (none) You can specify an unlimited number of these entries to map non-standard BibTeX fields to RIS tags. The BibTeX field name in this variable has to be in lowercase, regardless of the case in your input data (bib2ris treats field names as case-insensitive). The two-letter RIS tag has to be in uppercase. E.g. to map your BibTeX "Abstract" field to the RIS "N2" tag, the entry would read: "nsf_abstract N2".

Table 16-8. bib2riscgirc

Variable Default Comment
refdblib (none) The path of the directory containing shareable refdb files like DTDs, HTML templates etc.
logfile /var/log/bib2ris.log The full path of a custom log file. This is used only if logdest is set appropriately.
logdest 1 The destination of the log information. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile. The latter needs a proper setting of logfile.
loglevel 6 The log level up to which messages will be sent. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages. -1 means nothing will be logged.
abbrevfirst t If this option is set to "t", the first names of all authors and editors will be abbreviated to the initials. If set to "f", the first names will be used as they are found in the BibTeX bibliography file.
listsep ; This is the delimiter which separates individual keywords in a non-standard keyword field. Use the string "spc" for whitespace-separated lists (spaces and tabs).
forcejabbrev f If this is set to "t", journal names will be wrapped in RIS "JO" entries. If it is set to "f", bib2ris will use "JO" entries only if the journal name contains at least one period, otherwise it will use "JF".
maparticle JOUR map the BibTeX article publication type to a RIS type
mapbook BOOK map the BibTeX book publication type to a RIS type
mapbooklet PAMP map the BibTeX booklet publication type to a RIS type
mapconference CHAP map the BibTeX conference publication type to a RIS type
mapinbook CHAP map the BibTeX inbook publication type to a RIS type
mapincollection CHAP map the BibTeX incollection publication type to a RIS type
mapinproceedings CHAP map the BibTeX inproceedings publication type to a RIS type
mapmanual BOOK map the BibTeX manual publication type to a RIS type
mapmastersthesis THES map the BibTeX mastersthesis publication type to a RIS type
mapmisc GEN map the BibTeX misc publication type to a RIS type
mapphdthesis THES map the BibTeX phdthesis publication type to a RIS type
mapproceedings CONF map the BibTeX proceedings publication type to a RIS type
maptechreport RPRT map the BibTeX techreport publication type to a RIS type
mapunpublished UNPB map the BibTeX unpublished publication type to a RIS type
nsf_xyz (none) You can specify an unlimited number of these entries to map non-standard BibTeX fields to RIS tags. The BibTeX field name in this variable has to be in lowercase, regardless of the case in your input data (bib2ris treats field names as case-insensitive). The two-letter RIS tag has to be in uppercase. E.g. to map your BibTeX "Abstract" field to the RIS "N2" tag, the entry would read: "nsf_abstract N2".

bib2ris' data mangling

This section provides a few hints about the data conversion itself and the BibTeX format requirements.

Post-processing with tex2mail

refdb ships with a slightly modified version of the tex2mail Perl script. The original purpose of this script is to convert (La)TeX input into a human-readable plain text file, taking care of various mathematical commands which can be rendered in multi-line output. In lieu of a better way to provide someting useful in no time I hacked this script to generate suitable RIS output when used with the proper command line switches. Without the -ris switch the script behaves just like the original tex2mail script. The purpose of this script in the context of refdb is to strip TeX commands and constructs from the RIS output that bib2ris generates.

Warning
  • This script is really a quick hack. It will be replaced by something more dedicated to its purpose (at least I'll maintain this illusion for the time being).

  • If you have LaTeX math formulas somewhere in the field values, strange and wondrous things are likely to happen. You will have to manually fix the output.

Run the script with the following command, assuming that foo.ris is the output that you generated from your BibTeX bibliography file with the help of bib2ris:

~# tex2mail -noindent -ragged -linelength 65535 -ris < foo.ris > foo-notex.ris

The argument of the -linelength option should be large enough to display each field in a single line, otherwise tex2mail tries to generate some simple layout which will screw up the RIS file.