Gary Lengyel, Sun Microsystems, Inc.
Version 1.0, August 25, 2008
This paper presents an overview of the serviceability guidelines for open-source software products reviewed by the Open Architecture Review Committee(s). The guidelines are outlined in five major areas:
Product Installation and Removal
Product Configuration and Verification
Problem Management
Problem Logging and Recording
Documentation and Troubleshooting Support
Also provided is a section for Solaris-specific products.
These guidelines ensure an installation will work, will have an audit trail, and that the correct product distribution kit is installed. The uninstaller guidelines ensure that the system can be restored to its state prior to product installation.
Support for automated patching and updating of products shall be implemented.
The installer should verify operating system dependencies before installing the product.
The installer should verify the existence of any dependent and/or required software versions before installing the product. For example, a Java product may require that a specific or minimum version of the Java virtual machine or developer kit (JVM or JDK) is already installed, or a product may require the presence of a specific registry or database.
Installation procedures shall log all user input.
Installation procedures shall verify that the product installation (or upgrade or patch) has been completed successfully.
The names of all files in the product installation package shall follow the standard format for the associated platform, include the platform name and a unique version identifier for product, upgrade, or patch.
An uninstaller will be included with each product installation.
The uninstaller shall only remove those files installed or generated by the product, and shall remove all knowledge of the product from the system. This includes any registry entries, configuration files, and directories, as well as product binaries, archives and libraries.
The uninstaller shall provide the means to either keep or discard any end-user or configuration data associated with the installation. For example, user names, passwords, deployment descriptors, generated source code, development repositories, and network identifiers should all be optionally preserved when a product is uninstalled.
The uninstaller shall verify that the product has been successfully removed from the system.
A log entry shall be made to record the completion status following the installation or removal of an upgrade, patch, rollback, or the product itself. The log entries shall include the unique version identifier for the installed product, patch or upgrade.
These guidelines ensure that any changes made to user-configurable settings cannot negatively impact the product and that the product will run as expected by verifying the attributes and integrity of the products files.
The product shall attempt to validate and ensure that any configuration changes made by the user are valid.
Any range limits used by the product shall not be hard-coded (outside of values such as MAXINT). This allows service provider personnel to modify settings such as time-out values to assist in diagnosis.
Settings and configuration inconsistencies that may result in illegal configurations shall be reported via system messaging facilities.
The product shall provide a means to check and validate the ownership and protection of its files. Configuration files that contain sensitive information (such as user names and passwords) shall be protected from read or write access by any other user except the owner of the product installation.
The product shall verify that any required daemons are started at product startup.
The product shall ensure that no files exist that can prevent the product from running (i.e., lock files, temporary files, etc.)
These guidelines ensure that a diagnostician can easily locate the source of a problem.
The product shall be able to capture the state of the product on demand as well as during a fatal error
The product shall validate internal and external parameters to function and method calls.
The product shall detect and report if utilized resources (for example, memory and disk space) are being depleted, along with recommendations as to what should be done.
The product shall provide consistent logging mechanisms, message formats, log files, and log file locations.
These guidelines ensure that the required information needed to diagnose a problem is available:
Each log file should contain the unique name and version and patch identifier for the product.
Log file sizes shall be user-configurable
Logs shall archive entries when its maximum file size is exceed to make room for new entries
The product shall record all error events in persistent storage and shall conform to the system standard (for example, Java products should utilize JSR 47 logging interfaces).
The product shall log all configuration changes, and shall include the following information:
Setting name
New value
Previous value
Timestamp
Name and version of the change agent
The product shall terminate (throttle) the reporting and logging of correctable errors for a fixed period of time when high frequency of such errors occur. Error throttling period values shall be field adjustable.
The product shall include the following information every log entry:
Timestamp - preferably in a standard format (RFC 3339 or similar)
Module - Acronym of the module generating the message
LogLevel – Indicates relative severity, following an applicable standard if any.
MessageId - Unique within the product
MessageText - brief text message explaining the message
ContextInfo - key values that go with the message (context). Include stack trace if program error is suspected.
These guidelines present the documentation required by a service provider to aid in the diagnosis and verification of product functionality. Such documentation needed by any service provider include:
Test plans
Tested configurations list
Data scrubbing procedures
Re-certification procedures
Troubleshooting guide
Error code list
List of TCP/IP ports used by products
Minimum OS configuration list
In addition, access to all documentation (specifications, architecture documents, design, source code, debug and rebuild) is required.
This section presents guidelines to be used by the Solaris Operating System and/or those products designed to execute on Solaris.
Any products for installation on Solaris shall use the Service Management Facility (SMF) to ensure correct daemon startup and management.
DTrace probes shall be utilized to promote ease of diagnosis and traceability.
Product events (errors) shall utilize the Solaris Fault Manager for reporting and diagnosis.
Implementation of the Image Packaging System (once completed and available) shall be used for the installation of software packages