15.2. med2ris

This Perl script converts Pubmed reference data into RIS data. The converter understands both the tagged Pubmed format (which superficially resembles RIS) and the XML format according to the PubMedArticle DTD. In most cases med2ris is able to automatically detect the input data type.

15.2.1. Starting med2ris

Start the script with the following command:

[perl] med2ris [-e dest] [-f enc] [-h] [-i] [-l level] [-L logfile] [-o file] [-O file] [-q] [-t enc] [-T type] [-y path] [infile...]

Note: Specifying the command interpreter perl on the command line is not necessary if it is in the default location /usr/bin/perl.

The -e option takes either a numeric (0|1|2) or a symbolic (stderr|syslog|file) argument to specify the log destination.

The -f and -t options select the input and output character encoding, respectively. Supported encodings are platform-dependent and can usually be retrieved by running man iconv or man iconv_open. If no encodings are specified, "ISO-8859-1" aka Latin-1 is assumed for both input and output.

The -h displays a brief usage message.

Set the -i option to output additional information about unknown or unused tags.

Use the -l option to set the log level to a numeric value between 0 and 7 or to a symbolic value (alert|crit|err|warning|notice|info|debug). If the log destination is "file", the -L option specifies the full path of a custom log file.

The -o and -O options cause med2ris to write the output data into a file. The lowercase -o option will overwrite any existing file of the same name while the uppercase -O option will append the output to an existing file. If none of these options is used, the output will be written to stdout.

The -q option will cause med2ris to skip the configuration file which is mainly useful for debugging purposes.

Use the -T option to override the automatic input data type detection. Possible values for type are "xml" and "tag" for the XML and tagged data formats, respectively.

The -y switch can be used to specify the location of the refdb shared data in case the automatic script configuration is not appropriate on your system.

The input data are read from stdin unless one or more filenames are specified on the command line. In the latter case, the output generated from all files will be sent to stdout or to the output file.

The following examples show the usage of med2ris for file-based and stream-based in/output, respectively.

~# perl med2ris -o out.ris pm*

This will convert all files in the current directory starting with pm and write the output into out.ris, overwriting any existing file with the same name.

Note: You can leave out the "perl" in the above command if your Perl interpreter is in the default location /usr/bin/perl, as shown in the next example.

~#  med2ris -f "ISO-8859-1" -t "UTF-8" < pm001.txt >> out.ris

This will send the contents of pm001.txt to med2ris and convert the contents. The result will be appended to the file out.ris. The input data are assumed to be Latin-1, whereas the output will be Unicode.

15.2.2. The med2ris configuration variables

med2ris evaluates the file med2risrc to set its default values.

Table 15-1. med2risrc

VariableDefaultComment
outfile(none)The default output file name.
outappendtDetermines whether output is appended (t) to an existing file or overwrites (f) an existing file.
unmappedtIf set to t, unknown tags in the input data will be output following a <unmapped> tag; the resulting data can be inspected and then be sent through sed to strip off these additional lines. If set to f, unknown tags will be gracefully ignored.
from_encISO-8859-1The character encoding of the input data
to_encISO-8859-1The character encoding of the output data
logfile/var/log/med2ris.logThe full path of a custom log file. This is used only if logdest is set appropriately.
logdest1The destination of the log information. 0 = print to stderr; 1 = use the syslog facility; 2 = use a custom logfile. The latter needs a proper setting of logfile.
loglevel6The log level up to which messages will be sent. A low setting (0) allows only the most important messages, a high setting (7) allows all messages including debug messages. -1 means nothing will be logged.

15.2.3. med2ris' behind-the-scenes data mangling

Keywords with multiple MeSH subheadings are split into multiple keywords with one MeSH subheading each. This simplifies searching for MeSH subheadings greatly.

med2ris does not validate the input files. That is, the input files must stick to the rules of the data sources, otherwise the conversion results are not predictable.