Gary Lengyel, Sun Microsystems, Inc.
Version 1.1, August 26, 2008
This paper presents an overview of the serviceability guidelines for open-source software products reviewed by the Open Architecture Review Committee(s). The guidelines are outlined in five major areas:
Product Installation and Removal
Product Configuration and Verification
Problem Management
Problem Logging and Recording
Documentation and Troubleshooting Support
Also provided is a section for Solaris-specific products.
These guidelines ensure an installation will work, will have an audit trail, and that the correct product distribution kit is installed. The uninstaller guidelines ensure that the system can be restored to its state prior to product installation.
Support for automated patching and updating of products should be implemented.
The installer should verify operating system dependencies before installing the product.
The installer should verify the existence of any dependent and/or required software versions before installing the product. For example, a Java product may require that a specific or minimum version of the Java virtual machine or developer kit (JVM or JDK) is already installed, or a product may require the presence of a specific registry or database.
Installation procedures should log all user input (with the exception of sensitive data such as passwords).
Installation procedures should verify that the product installation (or upgrade or patch) has been completed successfully.
The names of all files in the product installation package should follow the standard format for the associated platform, include the platform name and a unique version identifier for product, upgrade, or patch.
An uninstaller should be included with each product installation.
The uninstaller should only remove those files installed or generated by the product, and should remove all knowledge of the product from the system. This includes any registry entries, configuration files, and directories, as well as product binaries, archives and libraries.
The uninstaller should provide the means to either keep or discard any end-user or configuration data associated with the installation. For example, user names, passwords, deployment descriptors, generated source code, development repositories, and network identifiers should all be optionally preserved when a product is uninstalled.
The uninstaller should verify that the product has been successfully removed from the system.
A log entry should be made to record the completion status following the installation or removal of an upgrade, patch, rollback, or the product itself. The log entries should include the unique version identifier for the installed product, patch or upgrade.
These guidelines ensure that any changes made to user-configurable settings cannot negatively impact the product and that the product will run as expected by verifying the attributes and integrity of the products files.
The product should attempt to validate and ensure that any configuration changes made by the user are valid.
Any range limits used by the product should not be hard-coded (outside of values such as MAXINT). This allows service provider personnel to modify settings such as time-out values to assist in diagnosis.
Settings and configuration inconsistencies that may result in illegal configurations should be reported via system messaging facilities.
The product should provide a means to check and validate the ownership and protection of its files. Configuration files that contain sensitive information (such as user names and passwords) should be protected from read or write access by any other user except the owner of the product installation.
The product should verify that any required daemons are started at product startup.
The product should ensure that no files exist that can improperly prevent the product from running (i.e., stale lock files, temporary files, etc.)
These guidelines ensure that a diagnostician can easily locate the source of a problem.
The product should be able to capture the state of the product on demand as well as during a fatal error
The product should validate internal and external parameters to function and method calls.
The product should detect and report if utilized resources (for example, memory and disk space) are being depleted, along with recommendations as to what should be done.
The product should provide consistent logging mechanisms, message formats, log files, and log file locations.
These guidelines ensure that the required information needed to diagnose a problem is available:
Each log file should contain the unique name and version and patch identifier for the product.
Log file sizes should be user-configurable
Logs should archive entries when its maximum file size is exceed to make room for new entries
The product should record all error events in persistent storage and should conform to the system standard (for example, Java products should utilize JSR 47 logging interfaces).
The product should log all configuration changes, and should include the following information:
Setting name
New value
Previous value
Timestamp
Name and version of the change agent
The product should terminate (throttle) the reporting and logging of correctable errors for a fixed period of time when high frequency of such errors occur. Error throttling period values should be field adjustable.
The product should include the following information in every log entry:
Timestamp - preferably in a standard format (RFC 3339 or similar)
Module - Acronym of the module generating the message
LogLevel – Indicates relative severity, following an applicable standard if any.
MessageId - Unique within the product
MessageText - brief text message explaining the message
ContextInfo - key values that go with the message (context). Include stack trace if program error is suspected.
These guidelines present the documentation required by a service provider to aid in the diagnosis and verification of product functionality. Such documentation needed by any service provider include:
Test plans
Tested configurations list
Data scrubbing procedures
Re-certification procedures
Troubleshooting guide
Error code list
List of TCP/IP ports used by products
Minimum OS configuration list
In addition, access to all documentation (specifications, architecture documents, design, source code, debug and rebuild) is required.
This section presents guidelines to be used by the Solaris Operating System and/or those products designed to execute on Solaris.
Any products for installation on Solaris should use the Service Management Facility (SMF) to ensure correct daemon startup and management.
DTrace probes should be utilized to promote ease of diagnosis and traceability.
Product events (errors) should utilize the Solaris Fault Manager for reporting and diagnosis.
Implementation of the Image Packaging System (once completed and available) should be used for the installation of software packages