Things to know before you start

Which database server?

refdb currently supports MySQL and PostgreSQL as external database servers as well as SQLite as an embedded database engine. This section tries to help you decide which one to pick.

The first issue is whether you want to run an external database server or not. External database servers scale better if many users share databases and they provide access control. The external database servers also use more fine-grained locking mechanisms which allow concurrent read and write accesses, whereas the SQLite engine will lock the entire database for write accesses. However, the latter does not provide access control and thus doesn't require any sort of user administration.

Rule #1. If you don't intend to share databases, or if running a database server scares you in any way, then you're better off with SQLite.

Another issue is the way how the database engines store their data. SQLite is unique in that it uses a single architecture-independent file per database which makes transferring the data to a different box a breeze. The external database engines use more sophisticated ways to organize their data, but you need some basic administrative skills in order to replicate the data.

Rule #2. If you cannot rely on remote access to your databases (something which refdb is well suited for) but have to take your data physically with you while travelling, SQLite is a better choice.

Now some words about the external database servers. As with many other fundamental schisms in the Unix world (vi vs. Emacs, KDE vs. Gnome, to name a few), both database servers supported by refdb have followers which are semi-religious about their choice. This leads us to:

Rule #3. If you already use one of the servers, then by all means use it also for refdb. Being familiar with the server and having it happily running usually outweighs any advantages that the other server might have.

But what if you do not yet run a suitable database server? You can browse the web and read for hours about the differences between MySQL and PostgreSQL, but for the purpose of managing refdb reference databases it boils down to two essential differences: MySQL is faster, but PostgreSQL has transaction support.

Note: For those not familiar with database terminology: transaction support basically means that the database server employs mechanisms to ensure that a transaction, that is a group of related commands affecting data in a database, either is fully completed or not at all. In the case of refdb this means that e.g. a reference is either added completely or not at all if something goes wrong inbetween. In systems without transaction support you might get into the situation that only half the information of a reference is added if some cleaning personnel accidentally pulls the plug while you add a reference.

This leads us to:

Rule #4. If you cherish speed over anything else, use MySQL. If you cherish data integrity over anything else, use PostgreSQL.

The last two rules need a little explanation.

Where do the components go?

As refdb is a three-tiered client-server application, you have considerable freedom to distribute the components among your computers. Although refdb shines in a network environment, there is absolutely no problem to run all components on a single standalone workstation.

Note: Please keep in mind that there's one tier less if you choose the SQLite embedded database engine. The databases will always be on the filesystem of the machine that runs refdbd (this doesn't exclude putting the files on an NFS share if you have a good reason to do so).

The basic idea of the client-server model has several implications:

The most common scenarios for using refdb will be on a department or institute network and on a standalone workstation. Some additional thrill is added by the fact that refdb can also provide a part of its functionality through a CGI-based web interface, so we'll look at this as a third scenario.

Installation on a standalone workstation

This is obviously the simplest case. The clients, the application server, the database server, and the databases reside on the same physical machine (see Figure 4-1). The only requirement for the workstation is that a TCP/IP network is installed. This is necessary as the three layers of refdb always communicate via TCP/IP sockets. The IP address 127.0.0.1 has to be specified in the configuration files of the clients and of the application server.

Figure 4-1. refdb on a standalone workstation

Installation in a network

In a network you can take advantage of the client-server model and distribute the workload between your computers. Although the three layers can well be distributed between three physical machines, it may be more useful to install the application server on the same machine as the database server and the databases (see Figure 4-2). A dedicated or general-purpose server may be most suitable to hold these components, as a workstation may get sluggish if it has to answer a lot of database requests.

The clients as well as scripts and support files have to be installed on all workstations that will be used to access the databases. The client for administrative tasks, refdba, can be restricted to the workstations of system administrators or otherwise experienced staff.

Figure 4-2. refdb on a network

Installation as a web application

You can of course set up the CGI functionality in parallel to any of the above mentioned installations. All you need to do is to keep copies of refdbc, bib2ris, and med2ris.pl on the computer that runs your web server (see Figure 4-3). refdbc can be configured to communicate with refdbd on the same or a different computer in the network, so for the rest of the installation you are free to use one of the above scenarios. The command-line clients refdba, refdbc, and refdbib should be available on at least one computer in the network for all those tasks that cannot be performed through the web interface.

Figure 4-3. Example of refdb as a web application

The mystery of the configuration files

refdb tries to be a well-behaved Unix-style software package and is thus able to pull a lot of information out of cryptic configuration files. This section tries to introduce you into the mysteries of the various levels of configuration files.

The purpose of the configuration files is to set some reasonable default values for the command-line switches of the refdb programs. Once you have set these, you will never have to specify these values on the command line again, unless you want to temporarily override them.

Types of configuration files

All refdb applications and scripts that use configuration files (these are the server refdbd, the clients refdbc, refdbib, refdba, as well as the conversion filters bib2ris, db2ris, med2ris.pl, marc2ris.pl, en2ris.pl, and nmed2ris) can use two configuration files each. One global configuration file is supplied by the system administrator and can be used to set values that are common for all users on that box, like the IP address of the application server. Another file can be used by every user to supply the values that were not set in the global file or to override settings in this file. The users' copies can have a leading dot to hide the files (the refdb programs will first try to read a hidden configuration file, and only if that cannot be found they try to read a non-hidden file).

refdbc, bib2ris, marc2ris.pl, and med2ris.pl use a second global configuration file if they are run as a CGI applications. A local configuration file does not make sense in this case.

The default location for the global configuration files is /usr/local/etc/refdb. There are two ways to change this. If you compile refdb from the sources you can specify a different directory with the --prefix or --sysconfdir options of ./configure. E.g. if you specify --sysconfdir=/etc, then the configuration files will be installed in /etc/refdb (the refdb part is automatically appended by the refdb install routines). If you use precompiled binaries, use the -y command line option to specify the directory. In this case you have to specify the full path, i.e. /etc/refdb to read the configuration files installed by the previous example.

The user copies of the client configuration files are expected to be in the users' home directories as specified by the environment variable HOME.

Configuration file syntax

All configuration files share a common syntax. There are just three essential things to know:

  • All information is stored as pairs of whitespace-separated items, one pair on each line. The first item on the line specifies the variable name, the second item specifies the variable value. Whitespace means one or more spaces or tabs in any combination.

  • Everything to the right of a hash sign (#) is a comment. The rest of the line is ignored.

  • The line endings are Unix-style (0x10, not DOS-style 0x13 0x10)

A configuration example

The whole configuration stuff may sound a bit confusing, so let us now look at a simple configuration example that illustrates the principles laid out above.

The following is a listing of /usr/local/etc/refdb/refdbcrc, our global refdbc configuration file in this example:

# This is the global configuration file for refdbc
serverip        127.0.0.1
port    9734
pager   more
timeout 60
# end of refdbcrc
       

This is the corresponding copy that one of the users of the system created as /home/markus/.refdbcrc:

# This is the user configuration file for refdbc
pager   less
username        markus
passwd  * 
timeout 30
# end of .refdbcrc
       

As you can see our hypothetical system administrator configured the IP address (serverip) and the port where refdbd listens to the client requests. This value is most likely the same for all users on the system, so this is nothing to worry about for the users. more is defined as the default pager, and the timeout is set to 1 minute.

The hypothetical user does not like more as a pager and prefers to use less instead. He also thinks that half a minute as a timeout should be enough. Both of these settings override the corresponding values in the global file. serverip and port are not redefined in the user's copy, so the values of the global file take effect. The prudent user also defined username and passwd so the correct values will be used for the database access (the asterisk in the passwd field will cause refdbc to ask for the password interactively for security reasons).

Configuration file variables

For a listing of available configuration file variables please see the tables for refdba, refdbc, refdbib, refdbd, bib2ris, db2ris, med2ris.pl, marc2ris.pl, en2ris.pl, and nmed2ris.

Environment variables

refdb uses the following environment variables to locate the files and directories it needs to run properly.

HOME

This variable should be set for all users anyway. It is used to locate the personal configuration files for the refdb clients.

Some notes on the filesystem

The default installation procedure will install the refdb files in locations compatible with the filesystem hierarchy standard. You will learn in the following sections how to change where the refdb files will be installed if you want to adapt the installation to specific needs of your system. To get a better idea of what you have to take care of if you don't like the defaults, here is a list of the directories used by refdb:

/usr/local/bin

This directory will receive all binary files and shell scripts.

/usr/local/etc/refdb

All global refdb configuration files end up in this directory.

/usr/local/share/refdb

This directory contains shareable, operating system independent files. The files are organized in a couple of subdirectories:

  • css contains a cascading stylesheet suitable for the HTML output of the getref command.

  • db holds the database files of embedded database engines.

  • declarations contains the default SGML declarations.

  • dsssl contains DSSSL stylesheets.

  • dtd contains the document type definitions used by refdb.

  • examples contains a few example reference data files as well as SGML and XML test documents using refdb citations.

  • site-lisp contains an Emacs Lisp file that implements the RIS mode.

  • sql contains SQL scripts used to initialize databases.

  • styles contains some XML files containing bibliography styles.

  • templates contains HTML fragments used by refdbd, bib2ris, and nmed2ris to create dynamic HTML pages for the refdb web interface.

  • www contains the static HTML pages for the refdb web interface.

  • xsl contains XSLT stylesheets.