Maintenance Procedures ZEBRASRV(8)
NAME
zebrasrv - Zebra Server
SYNOPSIS
zebrasrv [-install] [-installa] [-remove] [-a file]
[-v level] [-l file] [-u uid] [-c config]
[-f vconfig] [-C fname] [-t minutes] [-k kilobytes]
[-d daemon] [-w dir] [-p pidfile] [-ziDST1]
[listener-spec...]
DESCRIPTION
Zebra is a high-performance, general-purpose structured text
indexing and retrieval engine. It reads structured records
in a variety of input formats (eg. email, XML, MARC) and
allows access to them through exact boolean search
expressions and relevance-ranked free-text queries.
zebrasrv is the Z39.50 and SRU frontend server for the Zebra
search engine and indexer.
On Unix you can run the zebrasrv server from the command
line - and put it in the background. It may also operate
under the inet daemon. On WIN32 you can run the server as a
console application or as a WIN32 Service.
OPTIONS
The options for zebrasrv are the same as those for YAZ'
yaz-ztest. Option -c specifies a Zebra configuration file -
if omitted zebra.cfg is read.
-a file
Specify a file for dumping PDUs (for diagnostic
purposes). The special name - (dash) sends output to
stderr.
-S
Don't fork or make threads on connection requests. This
is good for debugging, but not recommended for real
operation: Although the server is asynchronous and
non-blocking, it can be nice to keep a software
malfunction (okay then, a crash) from affecting all
current users. The server can only accept a single
connection in this mode.
-1
Like -S but after one session the server exits. This
mode is for debugging only.
-T
Operate the server in threaded mode. The server creates
a thread for each connection rather than a fork a
process. Only available on UNIX systems that offers
zebra 2.0.40 Last change: 07/13/2009 1
Maintenance Procedures ZEBRASRV(8)
POSIX threads.
-s
Use the SR protocol (obsolete).
-z
Use the Z39.50 protocol (default). This option and -s
complement each other. You can use both multiple times
on the same command line, between
listener-specifications (see below). This way, you can
set up the server to listen for connections in both
protocols concurrently, on different local ports.
-l file
Specify an output file for the diagnostic messages. The
default is to write this information to stderr
-c config-file
Read configuration information from config-file. The
default configuration is ./zebra.cfg
-f vconfig
This specifies an XML file that describes one or more
YAZ frontend virtual servers. See section VIRTUAL HOSTS
for details.
-C fname
Sets SSL certificate file name for server (PEM).
-v level
The log level. Use a comma-separated list of members of
the set {fatal,debug,warn,log,malloc,all,none}.
-u uid
Set user ID. Sets the real UID of the server process to
that of the given user. It's useful if you aren't
comfortable with having the server run as root, but you
need to start it as such to bind a privileged port.
-w working-directory
The server changes to this working directory during
before listening on incoming connections. This option is
useful when the server is operating from the inetd
daemon (see -i).
-p pidfile
Specifies that the server should write its Process ID to
file given by pidfile. A typical location would be
/var/run/zebrasrv.pid.
-i
Use this to make the the server run from the inetd
zebra 2.0.40 Last change: 07/13/2009 2
Maintenance Procedures ZEBRASRV(8)
server (UNIX only). Make sure you use the logfile option
-l in conjunction with this mode and specify the -l
option before any other options.
-D
Use this to make the server put itself in the background
and run as a daemon. If neither -i nor -D is given, the
server starts in the foreground.
-install
Use this to install the server as an NT service (Windows
NT/2000/XP only). Control the server by going to the
Services in the Control Panel.
-installa
Use this to install and activate the server as an NT
service (Windows NT/2000/XP only). Control the server by
going to the Services in the Control Panel.
-remove
Use this to remove the server from the NT services
(Windows NT/2000/XP only).
-t minutes
Idle session timeout, in minutes. Default is 60 minutes.
-k size
Maximum record size/message size, in kilobytes. Default
is 1024 KB (1 MB).
-d daemon
Set name of daemon to be used in hosts access file. See
hosts_access(5) and tcpd(8).
A listener-address consists of an optional transport mode
followed by a colon (:) followed by a listener address. The
transport mode is either a file system socket unix, a SSL
TCP/IP socket ssl, or a plain TCP/IP socket tcp (default).
For TCP, an address has the form
hostname | IP-number [: portnumber]
The port number defaults to 210 (standard Z39.50 port) for
privileged users (root), and 9999 for normal users. The
special hostname "@" is mapped to the address INADDR_ANY,
which causes the server to listen on any local interface.
The default behavior for zebrasrv - if started as
non-priviledged user - is to establish a single TCP/IP
listener, for the Z39.50 protocol, on port 9999.
zebra 2.0.40 Last change: 07/13/2009 3
Maintenance Procedures ZEBRASRV(8)
zebrasrv @
zebrasrv tcp:some.server.name.org:1234
zebrasrv ssl:@:3000
To start the server listening on the registered port for
Z39.50, or on a filesystem socket, and to drop root
privileges once the ports are bound, execute the server like
this from a root shell:
zebrasrv -u daemon @
zebrasrv -u daemon tcp:@:210
zebrasrv -u daemon unix:/some/file/system/socket
Here daemon is an existing user account, and the unix socket
/some/file/system/socket is readable and writable for the
daemon account.
Z39.50 PROTOCOL SUPPORT AND BEHAVIOR
Z39.50 Initialization
During initialization, the server will negotiate to version
3 of the Z39.50 protocol, and the option bits for Search,
Present, Scan, NamedResultSets, and concurrentOperations
will be set, if requested by the client. The maximum PDU
size is negotiated down to a maximum of 1 MB by default.
Z39.50 Search
The supported query type are 1 and 101. All operators are
currently supported with the restriction that only proximity
units of type "word" are supported for the proximity
operator. Queries can be arbitrarily complex. Named result
sets are supported, and result sets can be used as operands
without limitations. Searches may span multiple databases.
The server has full support for piggy-backed retrieval (see
also the following section).
Z39.50 Present
The present facility is supported in a standard fashion. The
requested record syntax is matched against the ones
supported by the profile of each record retrieved. If no
record syntax is given, SUTRS is the default. The requested
element set name, again, is matched against any provided by
the relevant record profiles.
Z39.50 Scan
The attribute combinations provided with the
termListAndStartPoint are processed in the same way as
operands in a query (see above). Currently, only the term
and the globalOccurrences are returned with the termInfo
structure.
zebra 2.0.40 Last change: 07/13/2009 4
Maintenance Procedures ZEBRASRV(8)
Z39.50 Sort
Z39.50 specifies three different types of sort criteria. Of
these Zebra supports the attribute specification type in
which case the use attribute specifies the "Sort register".
Sort registers are created for those fields that are of type
"sort" in the default.idx file. The corresponding character
mapping file in default.idx specifies the ordinal of each
character used in the actual sort.
Z39.50 allows the client to specify sorting on one or more
input result sets and one output result set. Zebra supports
sorting on one result set only which may or may not be the
same as the output result set.
Z39.50 Close
If a Close PDU is received, the server will respond with a
Close PDU with reason=FINISHED, no matter which protocol
version was negotiated during initialization. If the
protocol version is 3 or more, the server will generate a
Close PDU under certain circumstances, including a session
timeout (60 minutes by default), and certain kinds of
protocol errors. Once a Close PDU has been sent, the
protocol association is considered broken, and the transport
connection will be closed immediately upon receipt of
further data, or following a short timeout.
Z39.50 Explain
Zebra maintains a "classic" Z39.50 Explain[1] database on
the side. This database is called IR-Explain-1 and can be
searched using the attribute set exp-1.
The records in the explain database are of type grs.sgml.
The root element for the Explain grs.sgml records is
explain, thus explain.abs is used for indexing.
Note
Zebra must be able to locate explain.abs in order to index
the Explain records properly. Zebra will work without it but
the information will not be searchable.
THE SRU SERVER
In addition to Z39.50, Zebra supports the more recent and
web-friendly IR protocol SRU[2]. SRU can be carried over
SOAP or a REST-like protocol that uses HTTP GET or POST to
request search responses. The request itself is made of
parameters such as query, startRecord, maximumRecords and
recordSchema; the response is an XML document containing
hit-count, result-set records, diagnostics, etc. SRU can be
thought of as a re-casting of Z39.50 semantics in
web-friendly terms; or as a standardisation of the ad-hoc
query parameters used by search engines such as Google and
zebra 2.0.40 Last change: 07/13/2009 5
Maintenance Procedures ZEBRASRV(8)
AltaVista; or as a superset of A9's OpenSearch (which it
predates).
Zebra supports Z39.50, SRU GET, SRU POST, SRU SOAP (SRW) -
on the same port, recognising what protocol is used by each
incoming requests and handling them accordingly. This is a
achieved through the use of Deep Magic; civilians are warned
not to stand too close.
Running zebrasrv as an SRU Server
Because Zebra supports all protocols on one port, it would
seem to follow that the SRU server is run in the same way as
the Z39.50 server, as described above. This is true, but
only in an uninterestingly vacuous way: a Zebra server run
in this manner will indeed recognise and accept SRU
requests; but since it doesn't know how to handle the CQL
queries that these protocols use, all it can do is send
failure responses.
Note
It is possible to cheat, by having SRU search Zebra with a
PQF query instead of CQL, using the x-pquery parameter
instead of query. This is a non-standard extension of CQL,
and a very naughty thing to do, but it does give you a way
to see Zebra serving SRU ``right out of the box''. If you
start your favourite Zebra server in the usual way, on port
9999, then you can send your web browser to:
http://localhost:9999/Default?version=1.1
&operation=searchRetrieve
&x-pquery=mineral
&startRecord=1
&maximumRecords=1
This will display the XML-formatted SRU response that
includes the first record in the result-set found by the
query mineral. (For clarity, the SRU URL is shown here
broken across lines, but the lines should be joined to
gether to make single-line URL for the browser to submit.)
In order to turn on Zebra's support for CQL queries, it's
necessary to have the YAZ generic front-end (which Zebra
uses) translate them into the Z39.50 Type-1 query format
that is used internally. And to do this, the generic
front-end's own configuration file must be used. See the
section called YAZ SERVER VIRTUAL HOSTS; the salient point
for SRU support is that zebrasrv must be started with the
-f frontendConfigFile option rather than the
-c zebraConfigFile option, and that the front-end
configuration file must include both a reference to the
zebra 2.0.40 Last change: 07/13/2009 6
Maintenance Procedures ZEBRASRV(8)
Zebra configuration file and the CQL-to-PQF translator
configuration file.
A minimal front-end configuration file that does this would
read as follows:
zebra.cfg
../../tab/pqf.properties
The element contains the name of the Zebra
configuration file that was previously specified by the -c
command-line argument, and the element contains
the name of the CQL properties file specifying how various
CQL indexes, relations, etc. are translated into Type-1
queries.
A zebra server running with such a configuration can then be
queried using proper, conformant SRU URLs with CQL queries:
http://localhost:9999/Default?version=1.1
&operation=searchRetrieve
&query=title=utah and description=epicent*
&startRecord=1
&maximumRecords=1
SRU PROTOCOL SUPPORT AND BEHAVIOR
Zebra running as an SRU server supports SRU version 1.1,
including CQL version 1.1. In particular, it provides
support for the following elements of the protocol.
SRU Search and Retrieval
Zebra supports the SRU searchRetrieve[3] operation.
One of the great strengths of SRU is that it mandates a
standard query language, CQL, and that all conforming
implementations can therefore be trusted to correctly
interpret the same queries. It is with some shame, then,
that we admit that Zebra also supports an additional query
language, our own Prefix Query Format (PQF[4]). A PQF query
is submitted by using the extension parameter x-pquery, in
which case the query parameter must be omitted, which makes
the request not valid SRU. Please feel free to use this
facility within your own applications; but be aware that it
is not only non-standard SRU but not even syntactically
valid, since it omits the mandatory query parameter.
zebra 2.0.40 Last change: 07/13/2009 7
Maintenance Procedures ZEBRASRV(8)
SRU Scan
Zebra supports SRU scan[5] operation. Scanning using CQL
syntax is the default, where the standard scanClause
parameter is used.
In addition, a mutant form of SRU scan is supported, using
the non-standard x-pScanClause parameter in place of the
standard scanClause to scan on a PQF query clause.
SRU Explain
Zebra supports SRU explain[6].
The ZeeRex record explaining a database may be requested
either with a fully fledged SRU request (with
operation=explain and version-number specified) or with a
simple HTTP GET at the server's basename. The ZeeRex record
returned in response is the one embedded in the YAZ Frontend
Server configuration file that is described in the the
section called YAZ SERVER VIRTUAL HOSTS.
Unfortunately, the data found in the CQL-to-PQF text file
must be added by hand-craft into the explain section of the
YAZ Frontend Server configuration file to be able to provide
a suitable explain record. Too bad, but this is all extreme
new alpha stuff, and a lot of work has yet to be done ..
There is no linkeage whatsoever between the Z39.50 explain
model and the SRU explain response (well, at least not
implemented in Zebra, that is ..). Zebra does not provide a
means using Z39.50 to obtain the ZeeRex record.
Other SRU operations
In the Z39.50 protocol, Initialization, Present, Sort and
Close are separate operations. In SRU, however, these
operations do not exist.
o SRU has no explicit initialization handshake phase, but
commences immediately with searching, scanning and
explain operations.
o Neither does SRU have a close operation, since the
protocol is stateless and each request is
self-contained. (It is true that multiple SRU
request/response pairs may be implemented as multiple
HTTP request/response pairs over a single persistent
TCP/IP connection; but the closure of that connection is
not a protocol-level operation.)
o Retrieval in SRU is part of the searchRetrieve
operation, in which a search is submitted and the
response includes a subset of the records in the result
set. There is no direct analogue of Z39.50's Present
zebra 2.0.40 Last change: 07/13/2009 8
Maintenance Procedures ZEBRASRV(8)
operation which requests records from an established
result set. In SRU, this is achieved by sending a
subsequent searchRetrieve request with the query
cql.resultSetId=id where id is the identifier of the
previously generated result-set.
o Sorting in CQL is done within the searchRetrieve
operation - in v1.1, by an explicit sort parameter, but
the forthcoming v1.2 or v2.0 will most likely use an
extension of the query language, CQL sorting[7].
It can be seen, then, that while Zebra operating as an SRU
server does not provide the same set of operations as when
operating as a Z39.50 server, it does provide equivalent
functionality.
SRU EXAMPLES
Surf into http://localhost:9999 to get an explain response,
or use
http://localhost:9999/?version=1.1&operation=explain
See number of hits for a query
http://localhost:9999/?version=1.1&operation=searchRetrieve
&query=text=(plant%20and%20soil)
Fetch record 5-7 in Dublin Core format
http://localhost:9999/?version=1.1&operation=searchRetrieve
&query=text=(plant%20and%20soil)
&startRecord=5&maximumRecords=2&recordSchema=dc
Even search using PQF queries using the extended naughty
parameter x-pquery
http://localhost:9999/?version=1.1&operation=searchRetrieve
&x-pquery=@attr%201=text%20@and%20plant%20soil
Or scan indexes using the extended extremely naughty
parameter x-pScanClause
http://localhost:9999/?version=1.1&operation=scan
&x-pScanClause=@attr%201=text%20something
Don't do this in production code! But it's a great fast
zebra 2.0.40 Last change: 07/13/2009 9
Maintenance Procedures ZEBRASRV(8)
debugging aid.
YAZ SERVER VIRTUAL HOSTS
The Virtual hosts mechanism allows a YAZ frontend server to
support multiple backends. A backend is selected on the
basis of the TCP/IP binding (port+listening adddress) and/or
the virtual host.
A backend can be configured to execute in a particular
working directory. Or the YAZ frontend may perform CQL[8] to
RPN conversion, thus allowing traditional Z39.50 backends to
be offered as a SRU[2] service. SRU Explain information for
a particular backend may also be specified.
For the HTTP protocol, the virtual host is specified in the
Host header. For the Z39.50 protocol, the virtual host is
specified as in the Initialize Request in the OtherInfo, OID
1.2.840.10003.10.1000.81.1.
Note
Not all Z39.50 clients allows the VHOST information to be
set. For those the selection of the backend must rely on the
TCP/IP information alone (port and address).
The YAZ frontend server uses XML to describe the backend
configurations. Command-line option -f specifies filename of
the XML configuration.
The configuration uses the root element yazgfs. This element
includes a list of listen elements, followed by one or more
server elements.
The listen describes listener (transport end point), such as
TCP/IP, Unix file socket or SSL server. Content for a
listener:
CDATA (required)
The CDATA for the listen element holds the listener
string, such as tcp:@:210, tcp:server1:2100, etc.
attribute id (optional)
identifier for this listener. This may be referred to
from server sections.
Note
We expect more information to be added for the listen
section in a future version, such as CERT file for SSL
servers.
zebra 2.0.40 Last change: 07/13/2009 10
Maintenance Procedures ZEBRASRV(8)
The server describes a server and the parameters for this
server type. Content for a server:
attribute id (optional)
Identifier for this server. Currently not used for
anything, but it might be for logging purposes.
attribute listenref (optional)
Specifies listener for this server. If this attribute is
not given, the server is accessible from all listener.
In order for the server to be used for real, howeever,
the virtual host must match (if specified in the
configuration).
element config (optional)
Specifies the server configuration. This is equivalent
to the config specified using command line option -c.
element directory (optional)
Specifies a working directory for this backend server.
If specifid, the YAZ fronend changes current working
directory to this directory whenever a backend of this
type is started (backend handler bend_start), stopped
(backend handler hand_stop) and initialized (bend_init).
element host (optional)
Specifies the virtual host for this server. If this is
specified a client must specify this host string in
order to use this backend.
element cql2rpn (optional)
Specifies a filename that includes CQL[8] to RPN
conversion for this backend server. See CQL[8] section
in YAZ manual. If given, the backend server will only
"see" a Type-1/RPN query.
element explain (optional)
Specifies SRU[2] ZeeRex content for this server - copied
verbatim to the client. As things are now, some of the
Explain content seems redundant because host
information, etc. is also stored elsewhere.
The format of the Explain record is described in detail,
with examples, on the file at the ZeeRex[9] web-site.
The XML below configures a server that accepts connections
from two ports, TCP/IP port 9900 and a local UNIX file
socket. We name the TCP/IP server public and the other
server internal.
zebra 2.0.40 Last change: 07/13/2009 11
Maintenance Procedures ZEBRASRV(8)
tcp:@:9900
unix:/var/tmp/socket
server1.mydomain
/var/www/s1
config.cfg
server2.mydomain
/var/www/s2
config.cfg
../etc/pqf.properties
server2.mydomain
9900
a
/var/www/s3
config.cfg
There are three configured backend servers. The first two
servers, "server1" and "server2", can be reached by both
listener addresses - since no listenref attribute is
specified. In order to distinguish between the two a virtual
host has been specified for each of server in the host
elements.
For "server2" elements for CQL[8] to RPN conversion is
supported and explain information has been added (a short
one here to keep the example small).
The third server, "server3" can only be reached via listener
"internal".
SEE ALSO
zebraidx(1)
NOTES
1. Z39.50 Explain
http://www.loc.gov/z3950/agency/markup/07.html
2. SRU
http://www.loc.gov/standards/sru/
zebra 2.0.40 Last change: 07/13/2009 12
Maintenance Procedures ZEBRASRV(8)
3. SRU searchRetrieve
http://www.loc.gov/standards/sru/specs/search-retrieve.html
4. PQF
http://www.indexdata.com/yaz/doc/tools.html#PQF
5. SRU scan
http://www.loc.gov/standards/sru/specs/scan.html/
6. SRU explain
http://www.loc.gov/standards/sru/specs/explain.html
7. CQL sorting
http://zing.z3950.org/cql/sorting.html
8. CQL
http://www.loc.gov/standards/sru/specs/cql.html
9. ZeeRex
http://explain.z3950.org/
ATTRIBUTES
See attributes(5) for descriptions of the following
attributes:
_______________________________________
| ATTRIBUTE TYPE | ATTRIBUTE VALUE|
|_______________________________________
| Availability | SUNWidzebra |
|_______________________________________
| Interface Stability| Uncommitted |
|____________________|_________________|
NOTES
Source for idzebra is available on http://opensolaris.org.
zebra 2.0.40 Last change: 07/13/2009 13