Things to know before you start

Which database server?

RefDB currently supports MySQL and PostgreSQL as external database servers as well as SQLite as an embedded database engine. This section tries to help you decide which one to pick.

The first issue is whether you want to run an external database server or not. External database servers scale better if many users share databases and they provide access control. The external database servers also use more fine-grained locking mechanisms which allow concurrent read and write accesses, whereas the SQLite engine will lock the entire database for write accesses. However, the latter does not provide access control and thus doesn't require any sort of user administration.

Rule #1. If you don't intend to share databases, or if running a database server scares you in any way, then you may better off with SQLite.

Another issue is the way how the database engines store their data. SQLite is unique in that it uses a single architecture-independent file per database which makes transferring the data to a different box a breeze. The external database engines use more sophisticated ways to organize their data, but you need some basic administrative skills in order to replicate the data.

Rule #2. If you cannot rely on remote access to your databases (something which RefDB is well suited for) but have to take your data physically with you while travelling, SQLite is a better choice.

Now some words about the external database servers. As with many other fundamental schisms in the Unix world (vi vs. Emacs, KDE vs. Gnome, to name a few), both database servers supported by RefDB have followers who are semi-religious about their choice. Both MySQL and PostgreSQL are robust and well-proven. This leads us to:

Rule #3. If you already use one of the servers, then by all means use it also for RefDB. Being familiar with the server and having it happily running usually outweighs any advantages that the other server might have.

But what if you do not yet run a suitable database server? You can browse the web and read for hours about the differences between MySQL and PostgreSQL, but for the purpose of managing RefDB reference databases it boils down to one essential difference: MySQL is faster.

This leads us to:

Rule #4. If you cherish speed over anything else, use MySQL.

Note

There's a few more differences that you should be aware of: PostgreSQL has transaction support by default. MySQL supports transactions only if you use InnoDB tables. If you want this additional peace of mind from MySQL, make sure InnoDB is the default table type. SQLite does not support Unix-style regular expressions. If you'd like to use these more versatile expressions instead of the simpler SQL regular expressions supported by SQLite, choose MySQL or PostgreSQL.

Where do the components go?

As RefDB is a three-tiered client-server application, you have considerable freedom to distribute the components among your computers. Although RefDB shines in a network environment, there is absolutely no problem to run all components on a single standalone workstation.

Note

Please keep in mind that there's one tier less if you choose the SQLite embedded database engine. The databases will always be on the filesystem of the machine that runs refdbd (this doesn't exclude putting the files on an NFS share if you have a good reason to do so).

The basic idea of the client-server model has several implications:

  • Many workstations can access a single server running the database server. Thus many people can access the same databases without the pain of duplicating the data and the database engine on every single machine.

  • A considerable part of the computing effort is done outside of the workstations. Therefore even rather lame workstations may be sufficient to access and manipulate the data. The database server should run on a decent machine, though (better not that dusty 486 that has doubled as a paperweight since 1990).

  • Updates of the software will mainly affect the database server and the application server. This considerably reduces your workload, as the workstations need to be updated less frequently.

The most common scenarios for using RefDB will be on a department or institute network and on a standalone workstation. Let's see how these scenarios differ:

Installation on a standalone workstation

This is obviously the simplest case. The clients, the application server, the database server, and the databases reside on the same physical machine (see Figure 4.1, “RefDB on a standalone workstation”). The only requirement for the workstation is that a TCP/IP network is installed. This is necessary as the three layers of RefDB always communicate via TCP/IP sockets. The IP address 127.0.0.1 has to be specified in the configuration files of the clients and of the application server.

Figure 4.1. RefDB on a standalone workstation

RefDB on a standalone workstation

Installation in a network

In a network you can take advantage of the client-server model and distribute the workload between your computers. Although the three layers can well be distributed between three physical machines, it may be more useful to install the application server on the same machine as the database server and the databases (see Figure 4.2, “RefDB on a network”). A dedicated or general-purpose server may be most suitable to hold these components, as a workstation may get sluggish if it has to answer a lot of database requests.

The clients as well as scripts and support files have to be installed on all workstations that will be used to access the databases. The client for administrative tasks, refdba, can be restricted to the workstations of system administrators or otherwise experienced staff.

Figure 4.2. RefDB on a network

RefDB on a network

The mystery of the configuration files

Like with most Unix-style software packages, the behaviour of the RefDB applications can be tweaked by configuration files. Wherever it makes sense, there is one global config file with useful admin-picked defaults, and another user config file for the individual user to play with. The purpose of the configuration files is to set some reasonable default values for the command-line switches of the RefDB programs. Once you have set these, you will never have to specify these values on the command line again, unless you want to temporarily override them.

Types of configuration files

All RefDB applications and scripts that use configuration files (these are the server refdbd, the clients refdbc, refdbib, refdba, the script refdbxml, as well as the conversion filters bib2ris, db2ris, med2ris.pl, marc2ris, and en2ris) can use two configuration files each. One global configuration file is supplied by the system administrator and can be used to set values that are common for all users on that box, like the IP address of the application server. Another file can be used by every user to supply the values that were not set in the global file or to override settings in this file. The users' copies can have a leading dot to hide the files (the refdb programs will first try to read a hidden configuration file, and only if that cannot be found they try to read a non-hidden file).

bib2ris, marc2ris, and med2ris use a second global configuration file if they are run as a CGI applications. A local configuration file does not make sense in this case.

The default location for the global configuration files is /usr/local/etc/refdb. There are two ways to change this. If you compile RefDB from the sources you can specify a different directory with the --prefix or --sysconfdir options of ./configure. E.g. if you specify --sysconfdir=/etc, then the configuration files will be installed in /etc/refdb (the refdb part is automatically appended by the RefDB install routines). If you use precompiled binaries, use the -y command line option to specify the directory. In this case you have to specify the full path, i.e. /etc/refdb to read the configuration files installed by the previous example.

The user copies of the client configuration files are expected to be in the users' home directories as specified by the environment variable HOME.

Configuration file syntax

All configuration files share a common syntax. There are just three essential things to know:

  • All information is stored as pairs of whitespace-separated items, one pair on each line. The first item on the line specifies the variable name, the second item specifies the variable value. Whitespace means one or more spaces or tabs in any combination.

  • Everything to the right of a hash sign (#) is a comment. The rest of the line is ignored.

  • The line endings are Unix-style (0x10, not DOS-style 0x13 0x10), regardless of the operating system.

A configuration example

The whole configuration stuff may sound a bit confusing, so let us now look at a simple configuration example that illustrates the principles laid out above.

The following is a listing of /usr/local/etc/refdb/refdbcrc, our global refdbc configuration file in this example:

# This is the global configuration file for refdbc
serverip	127.0.0.1
port	9734
pager	more
timeout	180
# end of refdbcrc
	

This is the corresponding copy that one of the users of the system created as /home/joe/.refdbcrc:

# This is the user configuration file for refdbc
pager	less
username	joesixpack
passwd  * 
timeout	30
# end of .refdbcrc
	

As you can see our hypothetical system administrator configured the IP address (serverip) and the port where refdbd listens to the client requests. This value is most likely the same for all users on the system, so this is nothing to worry about for the users. more is defined as the default pager, and the timeout is set to 3 minutes.

Joe Sixpack, our reckless user, does not like more as a pager and prefers to use less instead. He also thinks that half a minute as a timeout should be enough. Both of these settings override the corresponding values in the global file. serverip and port are not redefined in the user's copy, so the values of the global file take effect. Joe also defined username (which happens to be different from his login name "joe") and passwd so the correct values will be used for the database access (the asterisk in the passwd field will cause refdbc to ask for the password interactively for security reasons).

Configuration file variables

For a listing of available configuration file variables please see the tables for refdba, refdbc, refdbib, refdbd, refdbxml, bib2ris, db2ris, med2ris, marc2ris, and en2ris.

Environment variables

refdb uses the following environment variables to locate the files and directories it needs to run properly.

HOME

This variable should be set for all users anyway. It is used to locate the personal configuration files for the RefDB clients.

SGML_CATALOG_FILES

If you process SGML files, this variable will be consulted to locate the catalog files required for resolving public identifiers to their local filename equivalents.

Note

On some systems, the package system maintains a master catalog whose path is hard-coded into the SGML applications. In this case, the variable is not required.

XML_CATALOG_FILES

If you process XML files, this variable may be consulted to locate XML catalogs. If this variable is not set, many tools look into the default location /etc/xml/catalog instead. Remember that some XSLT processors need access to additional Java classes to provide XML catalog support at all.

Some notes on the filesystem

The default installation procedure will install the RefDB files in locations compatible with the filesystem hierarchy standard. You will learn in the following sections how to change where the RefDB files will be installed if you want to adapt the installation to specific needs of your system. To get a better idea of what you have to take care of if you don't like the defaults, here is a list of the directories used by RefDB:

/usr/local/bin

This directory will receive all binary files and shell scripts.

/usr/local/etc/refdb

All global RefDB configuration files end up in this directory.

/usr/local/share/refdb

This directory contains shareable, operating system independent files. The files are organized in a couple of subdirectories:

  • css contains a cascading stylesheet suitable for the HTML output of the getref command.

  • declarations contains the default SGML declarations.

  • dsssl contains DSSSL stylesheets.

  • dtd contains the document type definitions used by RefDB.

  • examples contains a few example reference data files as well as SGML and XML test documents using RefDB citations.

  • sql contains SQL scripts used to initialize databases.

  • sru contains the XSLT and CSS stylesheets required to set up the SRU service

  • styles contains some XML files containing bibliography styles.

  • xsl contains XSLT stylesheets.

/usr/local/var/lib/refdb/db

holds the database files of embedded database engines and a version file for use by package installation scripts