refdb tutorial: A day with the refdb clients
Prev	Chapter 2. Managing references and notes	Next

Finding references

The easiest way to get started with a refdb database is to see what's in there. Running queries does not alter the database, so it is safe to play around. We're assuming now that your default database already contains a few references. If this is not the case, you'll have to read about adding references first.

Once you've got used to using a reference management software, you'll find out that retrieving references is one of the most common tasks. refdbc offers the command getref command for this purpose.

Before we look at the syntax of this command, let's discuss what is important about the data you retrieve:

Which references to retrieve. In most cases, you are interested in particular references, not in a complete listing. Therefore you'll specify search criteria that will narrow down the list of matching references.
Which output format to use. References can be retrieved in a variety of formats, and which one you choose depends on what you'd like to do with them.
Where to put them. You can either display the results on the screen, or send the references to a file for later processing. Advanced users can even pipe the results to a shell command for further processing.

Note: This is why you won't find commands like "Export" or "Print". Instead you can write any output format to a file in order to export it or pipe any output format to lpr in order to print it, respectively.

Let's now look at the syntax of the command. This is admittedly the most complex refdbc command you'll have to deal with:

refdbc: getref -h
Displays the result of a database search.
Syntax: getref  [-c command] [-d database] [-E encoding] [-h] [-o outfile] [-O outfile][-P] [-R pdfroot] [-s format] [-S tag] [-t output-format] {search-string|-f infile}
Search-string: {:XY:{<|=|~|!=|!~|>}{string|regexp}} [AND|OR|AND NOT] [...]
where XY specifies the field to search in
Options: -c command   pipe the output through command
         -d database  specify the database to work with
         -E encoding  set the output character encoding
         -h           prints this mini-help
         -o outfile   save the output in outfile (overwrite)
         -O outfile   append the output to outfile
         -P           limit search to personal interest list
         -R           use pdfroot as root for path of pdf files
         -s format    specify fields for screen or style for DocBook output
         -S tag       sort output by tag ID (default) or PY
         -t output-format display as format scrn, html, xhtml, db31, db31x, teix, ris, risx, or bibtex
         -f infile    use the saved search line in file infile
         All other arguments are interpreted as the search string.
refdbc:

Now let us dissect this syntax into manageable chunks of information. We'll look at the bits and pieces of this command along the lines of the list above.

Which references do you want to retrieve?

refdbc uses a query language that is fairly simple yet powerful enough for more advanced tasks. Let's start with a very simple example:

refdbc: getref :ID:=7
ID*:7 (2002)                  
Key: Bellamy2002              
Bellamy,T.C., Garthwaite,J.   
Pharmacology of the nitric oxide receptor, soluble guanylyl cyclase,
in cerebellar cells           
Br.J.Pharmacol. 136(1):95-103

The only argument we passed to the getref command is the string ":ID:=7" (please note that there are no spaces on either side of the comparison operator). That is, we asked for the reference that has the numeric identifier "7" (all references have an unique identifier like this; it is automatically created when the reference is added to the database). Every refdbc command that accepts command line options uses some defaults if they're not specified, so in this case refdb sent back the reference in the "scrn" format and displayed the results on the screen using your favourite pager (more about the available output formats will be presented shortly). By default, the "scrn" format shows the following information:

: This is the ID of the reference, followed by the publication year in parentheses. The asterisk "*" tells you that the reference is part of your personal reference list, an advanced feature discussed below.
: This is the unique citation key of the reference. The citation key can be supplied by the user when a reference is added, otherwise refdb creates one for you. The purpose of the citation key is to provide a "handle" for a reference with an easy-to-remember name. Citation keys are preferred to IDs when writing citations in documents.
: This line lists all authors of the reference.
: This is the title of the reference.
: This line contains the publication information. As this is a citation of a journal article, it shows the name of the journal as well as the volume, issue, and page information of the particular article.

You'll learn in the next section how to tell getref to display more or all available information about the retrieved references.

Available fields to query

Retrieving references by their ID value is not very common with one exception: If for some reason you want to retrieve all references, you can use the command:

refdbc: getref :ID:>0

As all references have an ID value greater than 0, this command catches them all. But beware, this could literally be thousands of references, so don't keep your server busy just for fun.

In most cases you have some idea about the reference you're looking for. Either you know the name of an author, or when the document was published. Maybe you know a word or a phrase from the title, or you want to use keywords to look up references about a particular topic. The getref query language allows you to use all data fields in any combination to retrieve references. Let's first have a look at what fields are available besides the ID field we saw above.

:TY:: Type of the reference.
:ID:: The unique identifier of a reference. This is the numeric identifier that refdb assigns to each new reference.
:CK:: The unique citation key of a reference. This is the alphanumeric string that was either supplied by the user or automatically generated by refdb.
:TI:, :T2:, :T3:: The title of the reference, of the secondary title, and of the series title, respectively.
:AU:, :A2:, :A3:: The name of an author, of a secondary author/editor, and of a series author, respectively.
:PY:, :Y2:: The publication date and the secondary date, respectively (numeric).
:N1:: The notes that user can add to the reference.
:KW:: A keyword.
:RP:: The reprint status of the reference.
:AV:: The location of an offprint (physical, URL, or path)
:SP:: The start page.
:EP:: The end page.
:JO:, :JF:, :J1:, :J2:: The abbreviated name, the full name, the user abbreviation 1, and the user abbreviation 2 of a journal name, respectively.
:VL:: The volume number.
:ED:: The name of an editor.
:IS:: The issue (article) or chapter (book part) number.
:CY:: City of publication of a book.
:PB:: Name of the publishing company.
:U1: through :U5:: The user-defined fields 1 through 5. The intended contents of these fields are site-specific. Ask your administrator.
:N2:: The abstract of the reference.
:SN:: The ISSN or ISBN number.
:M1: through :M3:: The Miscellaneous fields 1 through 3. The intended contents of these fields are site-specific. Ask your administrator.
:AD:: The address of the contact person.
:UR:: The URL of a web page related to the reference.

In addition to the above field specifiers, there are a few that allow to retrieve references based on extended notes attached to them (see below for an explanation of extended notes):

:NID:: The ID of an extended note.
:NCK:: The alphanumeric key of an extended note.

Doing comparisons

We have to distinguish between numeric and alphanumeric fields, as indicated in the list above. This determines which comparison operators are available for that particular field. Alphanumeric fields can use the operators "=" (literal equality) and "!=" (literal non-equality) as well as "~" (regular expression equality) and "!~" (regular expression non-equality). Numeric fields can use the operators "=" (equality), "!=" (non-equality), "<" (less than), and ">" (greater than). The comparison of numeric fields should be straightforward, but the comparison of alphanumeric field needs a little further elucidation. Alphanumeric comparisons use either literal strings or regular expressions on the right-hand side of the operator. In the first case, the query matches if the whole string in the field indicated on the left is identical ("=") or not identical ("!=") to the string on the right. In the second case, the query matches if the regular expression finds a match somewhere in the text stored in the requested field. That is, the comparison ":N1:~'warts'" would return a result if the character sequence "warts" is somewhere in the text stored in the notes (N1) field. On the other hand, the comparison ":N1:='warts' would match only if a dataset contains a note that consists of the single word "warts".

Regular expressions are a complex but powerful feature. For the purposes of this tutorial we'll keep it simple and work only with these features:

All regular characters like 'a', 'b', '3', and so forth, simply match themselves. That is, the regular expression "ab" will match the string "ab", but also "cab", "cabby", but not "ba" or "lumbar".
The regular expression "." (period) matches any single character, and ".*" (period asterisk) matches zero or more occurrences of any character (the asterisk works with any other character than the period just as well). That is, "b.*b" matches "bb", "bob", "beeeebop"
So the period and the asterisk are special characters within a regular expression. But what if you want to match those literally? Then you'll have to escape the special character with a backslash. So "bad\." will match "bad." but not "bad;". What if you want to match a backslash? Then you obviously use a double backslash.
The '^' and '$' special characters serve as anchors for the regular expression. If you precede a regular expression with '^', the expression will match only at the beginning of the text. For example, the regular expression "^mule" will match the string "mules", but not "two mules". The '$' special character does the same at the end of the line. The bright among the readers may have figured out already how to specify a full match: "^mule$" will match only the string "mule", but not "two mules" or "mules".

With this knowledge, we can try a few more simple queries (the query results are not shown in these examples for the sake of brevity):

refdbc: getref :PY:>2000

This will list all references that were published in 2001 or later.

refdbc: getref :AU:~^Miller

This will list all references with at least one author whose name starts with "Miller".

refdbc: getref :KW:~'guanyl.* cyclase'

This will list all references with at least one keyword that may be "guanylyl cyclase" or "guanylate cyclase" or something related. Note that we had to enclose the regular expression in quotation marks because the expression contains a space.

refdbc: getref :RP:="IN FILE"

This will list all references with a reprint status "IN FILE".The reprint status field may only contain one of the values "NOT IN FILE", "ON REQUEST", or "IN FILE". As we used a literal match instead of a regular expression match, the query does not return the datasets with the value "NOT IN FILE" although "IN FILE" is a part of the former string.

Combining and grouping fields

What if using just one field doesn't narrow down your search sufficiently? Obviously there must be a way to combine fields in a search. refdb understands the boolean operators "AND", "OR", and "AND NOT" for this purpose. If you want to retrieve articles published by Dr. Miller in 1999, you'd use a query like:

refdbc: getref :AU:~"^Miller" AND :PY:=1999

You can join as many fields with boolean operators as you see fit. If you run queries involving several fields it might be necessary to group parts of your query using parentheses. Consider the following query:

refdbc: getref :AU:~"^Miller" AND :PY:=1999 OR :PY:=2000

You wanted to retrieve all papers by Dr. Miller published in either 1999 or 2000, but this query would return tons of papers published by other authors as well. How come? The reason is that the queries are evaluated from left to right unless parentheses change this order. In the example above, refdb would put all references with an author starting with "Miller" in a bag. Then it would take all of Dr. Miller's references out of the bag that are not published in 1999. Finally it would add all references published in 2000 by whichever author to the bag.

This is clearly different from what you had in mind. Now look at this query:

refdbc: getref :AU:~"^Miller" AND (:PY:=1999 OR :PY:=2000)

This would again put all of Dr. Miller's publications into a big bag, but this time refdb would go ahead and remove all of Dr. Miller's publications not dated 1999 or 2000 from the bag. Now you win.

Which format do you want to use?

refdb can output references in a variety of formats. To request a specific output format, use the -t option of the getref command. The following table gives a short overview over the available formats. The name is what you have to specify with the -t option. The specifics of these formats will be described below.

Table 2-1. refdb reference output formats

Name	File format	Purpose
scrn	plain text	display of search results in a terminal window
html	HTML 4.01	display of search results in a web browser
xhtml	XHTML 1.0	display of search results in an XML-aware web browser
ris	RIS	editing references, export of references to other reference management programs, backup of databases
risx	risx	editing references, export of references to XML-aware programs, backup of databases
db31	DocBook	manually creating bibliographies for DocBook SGML documents
db31x	DocBook	manually creating bibliographies for DocBook XML documents
teix	TEI P4	manually creating bibliographies for TEI XML documents
bibtex	BibTeX	export of references to a BibTeX file

scrn

The screen backend provides a basic data output for viewing in a terminal, preferably through a pager. By default, the reference ID, the publication year, the authors, the title, and the source information are displayed. You can use the -s option to additionally display the abstract (AB or N2), the notes (N1), the reprint info (RP), the address (AD), the publisher (PB), the city (CY), the URL (UR), and the user (U1 through U5) and misc (M1 through M3) fields. You can concatenate these identifiers, e.g. -s N1N2 would additionally display the notes and the abstract. -s ALL will display all available fields.

html

The html backend works just like the scrn backend, but encodes this information in a HTML text. This comes in handy if you would like to view the results of your queries in a web browser rather than in a terminal window. You simply use the -o switch to write the results of your queries to a file (see below), reusing the same filename for each query. After each query you just have to hit the reload button of your browser to view the results of the most recent query.

ris

This is identical to the input RIS format. Use it to edit references, to export them to other reference management systems, or to make RIS backups of your databases.

risx

The same holds true for this format which is the XML equivalent of RIS. The advantage of this format over RIS in terms of backing up your database is that it holds the personal information of all users of the database, instead of the information of only the current user.

bibtex

This backend provides output formatted for use as a bibtex reference database. This can be used with the tex and bibtex applications to create bibliographies for documents written with Donald Knuth's famous TeX typesetting system. The -s option cannot be used with this backend and will be ignored.

db31

The DocBook SGML backend formats the query result as a bibliography element in a SGML document using the DocBook DTD. refdb outputs an appropriate doctype string at the beginning of the data. The string is commented out so the contents can be directly inserted into a larger document by some processing application. If you need the data as a standalone document, simply uncomment the first line. The -s option cannot be used with this backend and will be ignored. The name "db31" means that the output will work with DocBook SGML 3.1 or later.

db31x

The output is essentially the same as with the preceeding backend but you'll get a DocBook XML document instead, compatible with all released versions of the DocBook XML DTD (it is sort of a misnomer because there was no XML version of DocBook 3.1).

teix

The TEI XML backend formats the query results as a TEI listBibl element according to TEI P4. refdb outputs an appropriate processing instruction and doctype string at the beginning of the data. The string is commented out so the contents can be directly inserted into a larger document by some processing application. If you need the data as a standalone document, simply uncomment the first line. The -s option cannot be used with this backend and will be ignored.

Where do you want to put the results?

By default, all getref output will be sent to your favourite pager unless you request a different target. The pager is fine as long as you try to find references, but when you want to edit a reference or reuse references in a markup document, you have to save the search results in a file. To this end, use one of the -o and -O options along with the filename where the data should end up. The lowercase -o option will either create the file or overwrite an existing file with the same name. The uppercase -O option will either create the file or append to an existing file with the same name. For example, the following command will write the query results to a file in HTML format. If the file already exists, e.g. from a previous query, it will be overwritten.

refdbc: getref -t html -o refs.html :AU:~^Miller

The -c option pipes the data into a shell command. This shell command can basically be anything that is able to read from the standard input. You could even specify a different pager with this option, although this is arguably not too exciting. However, there are more interesting commands as shown in the following examples:

refdbc: getref -c lpr :AU:~^Miller

This will output all references with authors whose names start with "Miller" on your printer. The plain text scrn format is used as we didn't request any particular format.

To find out how many words and characters the abstract of a particular reference contains, try the following:

refdbc: getref -t ris -c "grep '^N2  - ' | wc" :ID:=7
       1     255    1696

The output of reference 7 in RIS format will be piped to the command specified with the -c option. This happens to be another pipe. The refdb output will first be piped into grep, which isolates the abstract field. This line is then piped into wc which in turn outputs the line, word, and character count.

Note: This example assumes that all of the abstract field is in one line. If this is not the case, you'd need a more complex command. This is left to your ingenuity.

Along the same lines we could also make a listing of the frequency of all words in the abstract, regardless of whether this really provides usable hints for categorizing the reference:

refdbc: getref -t ris -c "grep '^N2  - ' | tr ' ' '\n' | sort | uniq -c | less" :ID:=7

As above, the grep command isolates the abstract field. This time the field is piped into tr which we ask to replace every space with a newline, so the resulting data will have one word per line. We then use sort and uniq to sort the list alphabetically and to remove duplicates, respectively. The result will then be displayed in the pager less. Phew. This is admittedly rather an exercise in self-indulgence, but by now you should have gathered that you can use arbitrarily complex shell commands as the argument to the -c option.

I can't resist to present a final example which is a variation of the old "eat your own dogfood" theme. refdbc can also run in a batch mode (without entering the interactive console mode) and receive data at standard input for certain commands. We can use this feature to update references on the fly. Look at the following command:

refdbc: getref -t ris -c "sed 's/ON REQUEST.*/IN FILE/' | refdbc -C updateref -P" :ID:=7

getref retrieves the reference with the ID 7 in RIS format and pipes the result into sed. We have sed replace the string "ON REQUEST" (including the trailing date info, if present) with "IN FILE", which is common when you receive a long-awaited reprint of a paper which is already in your database. Then we pipe the modified reference into another instance of refdbc. The -C command line option causes this instance of refdbc to run the updateref command using the data received at standard input and exit when done.

Note: To avoid undesired results it is recommended to do a dry run first and see whether the converted data look like they should. To this end, simply pipe the results of the sed command into less. If you like the results, hit the up key and replace less with the refdbc call as in the example above.

Prev	Home	Next
Adding references	Up	Adding extended notes