refdb handbook: covers version 0.9.6
Prev	Chapter 15. Reference data conversion tools	Next

15.4. bib2ris

bib2ris converts BibTeX bibliography files into RIS files. The filter is implemented using Greg Ward's outstanding btparse library.

Unfortunately the concepts underlying BibTeX and RIS bibliographic data are quite different so that BibTeX data do not readily lend themselves to a clean conversion to the RIS format. This is not meant as an excuse to provide a bad filter but you should be aware that a few compile-time assumptions have to be made in order to get reasonable results. In any case, as the data models differ considerably, a loss-free interconversion between the two data types is not possible: If you convert a BibTeX bibliography file to RIS and then back, the result will differ considerably from your input.

There are basically two ways how to handle BibTeX data with refdb:

Convert all entries to plain text. This will allow you to work with your data just as with "native" RIS data, i.e. all field values in the output of the refdb backends will be plain text as well.
Keep the TeX formatting in the entries. This will allow you to make use of TeX commands and formatting stuff in the BibTeX bibliography output, but it'll be a bit strange to work with these data in the rest of refdb. When formulating queries you will have to take account of the TeX magic, and this stuff will also show up in all other output (screen, HTML, DocBook etc).

There may be better support for this situation in future releases of refdb. Currently the rule of thumb is: If you're interested only in BibTeX bibliographies, keep the formatting. If you're interested in generating both BibTeX and DocBook bibliographies or if you're mainly interested to maintain an easily accessible reference database, strip off the TeX formatting. This is best done with the supplied tex2mail script which will be discussed shortly.

That said, you may still be interested to see how it works.

15.4.1. Starting bib2ris

Start bib2ris with the command:

bib2ris [-e log-destination] [-h] [-j] [-l log-level] [-L log-file] [-q] [-s separator] [-v] [-y confdir] [file]

Remember that you don't have to specify all these options each time if you define the values in bib2risrc.

The -e option defines the destination of log output. In order for log output to appear at all, the log level has to be specified correctly with the -l option. A log-destination argument of 0 directs log output to stderr, 1 uses the syslog facility, 2 uses a custom log file. For the latter to work you have to specify a log filename with the -L option.

With the -h option bib2ris displays a brief help screen and exits.

Use the -j option to force bib2ris to use "JO" RIS fields in all cases. If this option is not used, bib2ris tries to infer whether a journal name is an abbreviation or not. If the string contains at least one period, "JO" will be used, otherwise "JF" will be used.

The -l option determines the maximum log level that a log message may have to be logged. If you specify a high level (<=7), all sorts of messages including debug messages are logged. If you specify a low level (>=0), only critical errors are logged. Specify -1 to disable logging.

The -L option specifies a filename which is used as a custom log file if the -e option is set appropriately.

Note: The underlying btparse library sends some warnings and errors directly to stderr. Currently (i.e. without patching btparse) this behaviour cannot be controlled with the -e, -l, and -L switches. If you want to log these messages to a file as well you will have to employ some shell magic to redirect the output.

Use the -q option to temporarily switch off the settings in the init files. bib2ris will then use the compile-time defaults unless you specify things with the command line switches (useful for debugging configuration file settings).

The -s specifies the delimiter which separates individual keywords in a non-standard keyword field. Use the string "spc" for whitespace-separated lists (spaces and tabs).

-v prints the version information and brief licensing information, then exits.

Use the -y to specify the directory where the global configuration files are.

Note: By default, all refdb applications look for their configuration files in a directory that is specified during the configure step when building the package. That is, you don't need the -y option unless you use precompiled binaries in unusual locations, e.g. by relocating a rpm package.

All other command line parameters will be interpreted as input filenames. bib2ris can read the incoming data either from these files or from stdin. If data are available at stdin, the filename arguments will be ignored. The output is always sent to stdout, so you can either view the result by piping into a pager or redirect the data into a file. Of course it is also possible to directly pipe the result into refdbc but it may be prudent to manually check the output before sending something to refdbc that you may later regret.

The exit code of bib2ris indicates what went wrong in general (the details can be found in the log output). The code is the sum of the following error values:

Table 15-3. bib2ris exit codes

code	explanation
1	general error; includes out of memory situations and invalid command-line options
2	incomplete entry (at least one essential field in an entry was missing)
4	unknown field name
8	unknown publication type
16	invalid BibTeX->RIS type mapping
32	parse error; includes file access errors

As an example, if bib2ris exits with the error code 18, simple math would tell you that there was at least one error in the BibTeX-to-RIS mapping (16), most likely an invalid RIS tag, and at least one incomplete entry (2), the sum of which yields 18.

Note: Under some (really bad) error conditions the underlying btparse library exits without returning control to bib2ris. In that case the exit code is determined by btparse, not by bib2ris.

The following examples show how bib2ris reads data from stdin or from input files, respectively.

~# bib2ris *.bib | less

This command will convert all .bib files in the current directory and display the result in a pager.

~# bib2ris < foo.bib > foo.ris

This command reads the data via redirection from foo.bib and redirects the output into the file foo.ris.

15.4.2. The bib2ris variables

bib2ris evaluates the file bib2risrc to initialize itself.

Table 15-4. bib2risrc

Variable	Default	Comment
logfile	/var/log/bib2ris.log	The full path of a custom log file. This is used only if logdest is set appropriately.
logdest	1	The destination of the log information. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile. The latter needs a proper setting of logfile.
loglevel	6	The log level up to which messages will be sent. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages. -1 means nothing will be logged.
abbrevfirst	t	If this option is set to "t", the first names of all authors and editors will be abbreviated to the initials. If set to "f", the first names will be used as they are found in the BibTeX bibliography file.
listsep	;	This is the delimiter which separates individual keywords in a non-standard keyword field. Use the string "spc" for whitespace-separated lists (spaces and tabs).
forcejabbrev	f	If this is set to "t", journal names will be wrapped in RIS "JO" entries. If it is set to "f", bib2ris will use "JO" entries only if the journal name contains at least one period, otherwise it will use "JF".
maparticle	JOUR	map the BibTeX article publication type to a RIS type
mapbook	BOOK	map the BibTeX book publication type to a RIS type
mapbooklet	PAMP	map the BibTeX booklet publication type to a RIS type
mapconference	CHAP	map the BibTeX conference publication type to a RIS type
mapinbook	CHAP	map the BibTeX inbook publication type to a RIS type
mapincollection	CHAP	map the BibTeX incollection publication type to a RIS type
mapinproceedings	CHAP	map the BibTeX inproceedings publication type to a RIS type
mapmanual	BOOK	map the BibTeX manual publication type to a RIS type
mapmastersthesis	THES	map the BibTeX mastersthesis publication type to a RIS type
mapmisc	GEN	map the BibTeX misc publication type to a RIS type
mapphdthesis	THES	map the BibTeX phdthesis publication type to a RIS type
mapproceedings	CONF	map the BibTeX proceedings publication type to a RIS type
maptechreport	RPRT	map the BibTeX techreport publication type to a RIS type
mapunpublished	UNPB	map the BibTeX unpublished publication type to a RIS type
nsf_xyz	(none)	You can specify an unlimited number of these entries to map non-standard BibTeX fields to RIS tags. The BibTeX field name in this variable has to be in lowercase, regardless of the case in your input data (bib2ris treats field names as case-insensitive). The two-letter RIS tag has to be in uppercase. E.g. to map your BibTeX "Abstract" field to the RIS "N2" tag, the entry would read: "nsf_abstract N2".

Table 15-5. bib2risrc

Variable	Default	Comment
refdblib	(none)	The path of the directory containing shareable refdb files like DTDs, HTML templates etc.
logfile	/var/log/bib2ris.log	The full path of a custom log file. This is used only if logdest is set appropriately.
logdest	1	The destination of the log information. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile. The latter needs a proper setting of logfile.
loglevel	6	The log level up to which messages will be sent. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages. -1 means nothing will be logged.
abbrevfirst	t	If this option is set to "t", the first names of all authors and editors will be abbreviated to the initials. If set to "f", the first names will be used as they are found in the BibTeX bibliography file.
listsep	;	This is the delimiter which separates individual keywords in a non-standard keyword field. Use the string "spc" for whitespace-separated lists (spaces and tabs).
forcejabbrev	f	If this is set to "t", journal names will be wrapped in RIS "JO" entries. If it is set to "f", bib2ris will use "JO" entries only if the journal name contains at least one period, otherwise it will use "JF".
maparticle	JOUR	map the BibTeX article publication type to a RIS type
mapbook	BOOK	map the BibTeX book publication type to a RIS type
mapbooklet	PAMP	map the BibTeX booklet publication type to a RIS type
mapconference	CHAP	map the BibTeX conference publication type to a RIS type
mapinbook	CHAP	map the BibTeX inbook publication type to a RIS type
mapincollection	CHAP	map the BibTeX incollection publication type to a RIS type
mapinproceedings	CHAP	map the BibTeX inproceedings publication type to a RIS type
mapmanual	BOOK	map the BibTeX manual publication type to a RIS type
mapmastersthesis	THES	map the BibTeX mastersthesis publication type to a RIS type
mapmisc	GEN	map the BibTeX misc publication type to a RIS type
mapphdthesis	THES	map the BibTeX phdthesis publication type to a RIS type
mapproceedings	CONF	map the BibTeX proceedings publication type to a RIS type
maptechreport	RPRT	map the BibTeX techreport publication type to a RIS type
mapunpublished	UNPB	map the BibTeX unpublished publication type to a RIS type
nsf_xyz	(none)	You can specify an unlimited number of these entries to map non-standard BibTeX fields to RIS tags. The BibTeX field name in this variable has to be in lowercase, regardless of the case in your input data (bib2ris treats field names as case-insensitive). The two-letter RIS tag has to be in uppercase. E.g. to map your BibTeX "Abstract" field to the RIS "N2" tag, the entry would read: "nsf_abstract N2".

15.4.3. bib2ris' data mangling

This section provides a few hints about the data conversion itself and the BibTeX format requirements.

The parsing of the input data is done by the btparse library. All limitations of that library apply to bib2ris as well. This applies very specifically to two hardcoded settings in btparse which, simply put, limit the size and complexity (in terms of macros) of an input file that btparse can handle. If you run into this kind of problem (I had to pull a 2 MB BibTeX bibliography from the net in order to verify this limit) you should increase the values of NUM_MACROS and STRING_SIZE in the source file macros.c and recompile the btparse library.
All entry names and field names in the BibTeX input file are treated as case-insensitive, i.e. "BoOk" is the same as "book" and "AUTHOR" is the same as "aUthoR".
The entries are checked for completeness. An error is generated if an entry lacks fields which are considered essential for the particular publication type.
Non-standard fields can be imported in addition to the predefined BibTeX fields. Create an entry for each non-standard BibTeX field name that your input data use in your bib2ris configuration file. The data are handled differently based on the type of RIS field they are imported to. If the data are imported to the RIS fields AD, N1, or N2, which basically have an unlimited size, all occurrences of these fields will be concatenated into a single AD, N1, or N2 tag line, respectively. If the data are mapped to the RIS KW field, the string will be tokenized based on the list separator specified in the listsep configuration variable. Each token will be written as a separate KW tag line. A special case is the RIS pseudo-field "PY.day". Data imported to this tag are integrated as the day part in the publication date tag line "PY" (year and month, but not day, are standard BibTeX fields and are recognized by default). All other fields will be printed with their requested RIS tag. It is at the discretion of any RIS importing application to decide what to do with duplicate tag lines. Multiples are allowed for author tags (AU, A2, A3) and the keyword tag (KW). refdb will use the last occurrence of a tag line that does not allow multiple occurrences.
Abbreviated journal names are detected only if they use periods. E.g. "J. Biol. Chem." will be mapped to a "JO" RIS element whereas "J Biol Chem" will be (incorrectly) mapped to a "JF" element ("Journal of Biological Chemistry" would correctly end up here too). Spaces after periods are optional. To capture "J Biol Chem" in a "JO" element, use the -j command line option or the "forcejabbrev" configuration file variable.
The mapping of BibTeX publication types (book, inproceedings...) to RIS types as specified in the configuration file is checked for valid RIS types. If an invalid RIS type is specified, an error is generated and the compile-time default is used instead.
By default the first names of authors and editors are not abbreviated. If you wish you can configure bib2ris to abbreviate first and middle names.

15.4.4. Post-processing with tex2mail

refdb ships with a slightly modified version of the tex2mail Perl script. The original purpose of this script is to convert (La)TeX input into a human-readable plain text file, taking care of various mathematical commands which can be rendered in multi-line output. In lieu of a better way to provide someting useful in no time I hacked this script to generate suitable RIS output when used with the proper command line switches. Without the -ris switch the script behaves just like the original tex2mail script. The purpose of this script in the context of refdb is to strip TeX commands and constructs from the RIS output that bib2ris generates.

Warning

This script is really a quick hack. It will be replaced by something more dedicated to its purpose (at least I'll maintain this illusion for the time being).
If you have LaTeX math formulas somewhere in the field values, strange and wondrous things are likely to happen. You will have to manually fix the output.

Run the script with the following command, assuming that foo.ris is the output that you generated from your BibTeX bibliography file with the help of bib2ris:

~# tex2mail -noindent -ragged -linelength 65535 -ris < foo.ris > foo-notex.ris

The argument of the -linelength option should be large enough to display each field in a single line, otherwise tex2mail tries to generate some simple layout which will screw up the RIS file.

Prev	Home	Next
nmed2ris	Up	en2ris