SRU Operations

SRU Operations
Prev	Chapter 11. RefDB SRU interface	Next

Note

This section assumes that you run the SRU service using the CGI application. If you use the standalone server instead, please adapt the URLs by replacing "http://mybox.com/cgi-bin/" with "http://localhost:8080/".

SRU defines three operations, all of which return XML documents:

explain: describes the available facilities in terms of record schemas, available indexes and so on. Sort of a cheat sheet. The original specification is here.
searchRetrieve: performs a database query and retrieves the matching datasets. The original specification is here.
scan: retrieves a list of matching search terms for later use in a searchRetrieve operation. The original specification is here.

You are encouraged to peruse the linked specifications above to learn the general principles. The following sections build on this knowledge and describe the RefDB SRU interface with a focus on its peculiarities and limitations. We'll assume that your web server is set up to run the refdbsru CGI script using the following URL: http://mybox.com/cgi-bin/refdbsru/.

The explain operation

The explain operation is the simplest of all and a good start to introduce the syntax of the SRU interface. RefDB fully supports the explain operation. Any of the following URLs typed into your browser will run it:

http://mybox.com/cgi-bin/refdbsru/
http://mybox.com/cgi-bin/refdbsru/?
http://mybox.com/cgi-bin/refdbsru/?operation=explain&version=1.1

The URL part following the question mark ("?") in the third example is the search-part which consists of "parameter=value" pairs glued together with ampersands ("&"). Both parameters shown here are mandatory for all SRU operations as we'll see shortly.

The query will return a XML document describing the capabilities of the RefDB SRU interface.

The searchRetrieve operation

The searchRetrieve operation is the one used to actually get hold of the reference data you're looking for. Your query is sent in the query parameter which is mandatory for this operation. A few examples:

http://mybox.com/cgi-bin/refdbsru/?operation=searchRetrieve&version=1.1&recordSchema=mods&query=bib.name%3d%22Miller,Henry J.%22
http://mybox.com/cgi-bin/refdbsru/?operation=searchRetrieve&version=1.1&recordSchema=risx&query=dc.subject%3d%22circular dichroism%22+or+dc.subject%3d%22NMR%22

The first example requests the bibliographic data in MODS format. The query proper reads 'bib.name="Miller, Henry J."' and translates to a search for all references where a person with that name is listed as an author, editor, or series editor. The second example requests the data in risx format and searches all references with the keywords "circular dichroism" or "NMR".

Both examples make use of percent encoding to make the URL string conform to the specs. This is further discussed below.

The query parameter

The query parameter describes the criteria of your database query and is a string using the Common Query Language.

Conformance

The RefDB SRU support conforms to CQL Level 2. The following general restrictions apply:

RefDB does not support persistent result sets. Therefore, the resultSetTTL request parameter is meaningless, and it is not possible to reference a result set in a subsequent query.
RefDB does not support XPath expressions to modify the results. Therefore the recordXPath request parameter is not honored. You can of course apply any XPath expressions on the client side using an appropriate processor.
Sorting is currently not supported, and the sortKeys parameter is not applicable. Data will always be sorted by ID
The recordPacking parameter is not supported. Records are always returned as XML.
RefDB does not support relation modifiers and boolean modifiers in CQL queries.
prox is not supported as a boolean operator.
The relation encloses is not supported
The support for regular expressions ("masking" in CQL) depends on the database backend. Most notably, anchoring is not supported by SQLite and SQLite3.

Defaults

If a query or a query part does not specify an index and a relation, RefDB looks for the term in the author, keyword, and title indexes:

http://mybox.com/cgi-bin/refdbsru/?operation=searchRetrieve&version=1.1&query=cat

This query will try to find references that contain the string "cat" in either the title, a keyword, or an author name.

Context sets

RefDB supports the context sets Dublin Core (dc) and the not yet officially released CQL Bibliographic Searching (bib). The following table lists the relationship of the indexes defined in these context sets with the RefDB fields.

Note

RefDB of course implicitly also supports the cql context set.

Table 11.1. Context sets

dc index	bib index	RefDB field	search/scan?	description
title	title	TX	y/n	item titles
	seriesTitle	T3	y/n	series title
	titleAbbrev	JA	y/y	journal title, abbreviated
creator, contributor	name, namePersonal, nameCorporate	AX	y/y	authors and editors
subject, coverage	subject	KW	y/y	keywords
date	dateIssued	PY	y/n	publication date
	volume	VL	y/n	periodical volume
	issue	IS	y/n	periodical issue
	startPage	SP	y/n	start page
	endPage	EP	y/n	end page
publisher		PB	y/n	publisher

Encoding

As you may have noticed, it is necessary to percent-encode a few special characters in the parameter values. E.g. the equal sign ("=") assigns the values to the parameters and does not have to be encoded. However, equal signs within the CQL query string (which is the value of the query parameter) must be percent-encoded. If you use a dedicated client to run your queries, you should not have to care about these conversions. If you use a web-browser or a similar device, you may find the following conversion table useful:

Table 11.2. Percent-encoding special characters

replace	with	replace	with
:	%3a	/	%2f
?	%3f	#	%23
[	%5B	]	%5D
@	%40	!	%21
$	%24	&	%26
'	%27	(	%28
)	%29	*	%2a
+	%2b	,	%2c
;	%3b	=	%3d
%	%25	"	%22

Schemas

RefDB can return the datasets using two different XML schemas which you can request with the recordSchema parameter:

MODS: MODS is a schema for bibliographic data in library applications. Use 'mods' as the parameter value. The returned datasets will use 'mods' as the namespace prefix. MODS is the default if you do not specify a schema.
risx: This is RefDB's default XML input and output format. Use 'risx' as the parameter value to request risx. The datasets will use 'risx' as the namespace prefix.

Databases

SRU assumes that the base URL of the SRU service (the one you enter to get an explain response) corresponds to one database. Instead of using several copies of the CGI script to service more than one database, refdbsru allows to specify the name of a database in the additional path information of the URL. Compare the following (pseudo-)URLs:

http://myserver.com/cgi-bin/refdbsru/?<query>
http://myserver.com/cgi-bin/refdbsru/foo?<query>

The first URL will use the default database. The second URL will use the database "foo" instead. The database name goes between the slash that follows the CGI script name and the question mark that opens the query string.

The scan operation

The purpose of the scan operation is to provide a matching list of query terms, along with the number of references each term would retrieve. This is similar to browsing through a stack of library cards with subjects or author names on them. The RefDB SRU service allows to scan the following database fields:

keywords (bib.subject)
http://mybox.com/cgi-bin/refdbsru/?operation=scan&version=1.1&scanClause=bib.subject%3d%22dichroism%22
author names (bib.name)
http://mybox.com/cgi-bin/refdbsru/?operation=scan&version=1.1&scanClause=bib.name%3d%22Henry J.%22
journal abbreviations (bib.titleAbbrev)

replace	with	replace	with
:	%3a	/	%2f
?	%3f	#	%23
[	%5B	]	%5D
@	%40	!	%21
$	%24	&	%26
'	%27	(	%28
)	%29	*	%2a
+	%2b	,	%2c
;	%3b	=	%3d
%	%25	"	%22

replace	with	replace	with
:	%3a	/	%2f
?	%3f	#	%23
[	%5B	]	%5D
@	%40	!	%21
$	%24	&	%26
'	%27	(	%28
)	%29	*	%2a
+	%2b	,	%2c
;	%3b	=	%3d
%	%25	"	%22

replace	with	replace	with
:	%3a	/	%2f
?	%3f	#	%23
[	%5B	]	%5D
@	%40	!	%21
$	%24	&	%26
'	%27	(	%28
)	%29	*	%2a
+	%2b	,	%2c
;	%3b	=	%3d
%	%25	"	%22