Functional Spec for Derby-Style Table Functions

Revision Description Date Author
3.0 Clarifications and other changes, based on discussions with Army Brown and Dan Debrunner.
  • Wordsmithed the description of the kinds of expressions which can appear as arguments to the invocation of Table Functions.
  • Added some explanatory text to the Sample VTI, explaining that more sophisticated connection management would be needed for a production-hardened, client-server application.
  • Retained the org.apache.derby.vti.VTIEnvironment arguments to the org.apache.derby.vti.VTICosting methods and exposed org.apache.derby.vti.VTIEnvironment in the Public API.
  • Removed the appendix showing new javadoc for VTICosting.
  • Improved the Sample VTI section with an example of how to use org.apache.derby.vti.VTIEnvironment.
May 18, 2007 Rick Hillegas
2.0 Clarifications, based on discussions with Army Brown, Christian d'Heureuse, and Dan Debrunner.
  • Renamed DERBY parameter style to DERBY_JDBC_RESULT_SET.
  • Removed confusing description of a Derby-style Table Function's ResultSetMetaData. Removed this from the section on Runtime Behavior.
  • Added an appendix showing a sample VTI complete with VTICosting implementation.
  • Add parentheses to syntax and example of TABLE constructors in FROM clauses.
  • Clarified the kind of arguments that can appear in the constructors of Table Functions.
  • Removed parameters and return columns from SYS.SYSCOLUMNS.
  • Note that RowMultisetImpl is only accidentally exposed to users.
April 27, 2007 Rick Hillegas
1.0 First cut, based on discussions with Dan Debrunner and Christian d'Heureuse. April 11, 2007 Rick Hillegas

Overview

Here we describe Derby-style Table Functions, a feature which lets Derby treat user-written ResultSets like tables. Table Functions bolt a tabular api onto arbitrary data feeds. Through Table Functions, external data feeds can drive inserts and queries. Table Functions let people do the following:

Table Functions are described in the SQL Standard:

Derby-style Table Functions are a powerful and elegant blend of the SQL and JDBC standards:

This use of ResultSets is a Derby innovation not described by the SQL Standard. The SQL Standard describes a different convention for implementing the Java methods which materialize Table Functions. Those conventions are described in the SQL Standard (Part 13, sections 8.5 and 8.6) and are summarized on the Java Table Functions wiki page. A later release could build on this feature and implement those conventions too. That work, however, falls outside the scope of this spec.

Terms

For the remainder of this spec, we use the following terms:

New SQL Syntax

We introduce new SQL syntax.

New CREATE FUNCTION Syntax

We introduce a new unreserved keyword, DERBY_JDBC_RESULT_SET. This describes a Derby-specific parameter style, which maps Table Functions to ResultSets.

We exhance the CREATE FUNCTION statement as follows. Changes are marked in red.


CREATE FUNCTION function-name
(
   [ FunctionParameter 
   [, FunctionParameter] ] *
)
RETURNS ReturnType
[ FunctionElement ] *

function-name ::= [ schemaName. ] SQL92Identifier

FunctionParameter ::= [ parameter-Name ] DataType


ReturnType ::= { DataType | TableType }

TableType ::= TABLE ( ColumnElement [, ColumnElement ]* )

ColumnElement ::= ColumnName DataType

ColumnName :: = SQL92Identifier


FunctionElement ::=
{
 | LANGUAGE { JAVA }
 | EXTERNAL NAME string
 | PARAMETER STYLE ParameterStyle
 | { NO SQL | CONTAINS SQL | READS SQL DATA }
 | { RETURNS NULL ON NULL INPUT | CALLED ON NULL INPUT }
}


ParameterStyle ::= { JAVA | DERBY_JDBC_RESULT_SET }

The following extra rules apply:

Here is an example:


CREATE FUNCTION externalEmployees
()
RETURNS TABLE
(
  employeeId    INT,
  lastName       VARCHAR( 50 ),
  firstName      VARCHAR( 50 ),
  birthday         DATE
)
LANGUAGE JAVA
PARAMETER STYLE DERBY_JDBC_RESULT_SET
NO SQL
EXTERNAL NAME 'com.acme.hrSchema.EmployeesTable.read'
;

Additional SELECT Syntax

We document SELECT statement syntax which allows Table Functions in the FROM list. This syntax is supported today for the builtin diagnostic VTIs and is shown in the Reference Manual in the section titled "SYSCS_DIAG diagnostic tables and functions". However, the 10.2 Reference Manual does not document this syntax in the section titled "FROM clause". Changes to the syntax documented under "FROM clause" are marked in red.


...

FROM TableExpression [, TableExpression ]

TableExpression ::=
{
  TableOrViewExpression | JOIN operation 
}

TableOrViewExpression ::=
{ table-Name | view-Name | TableFunctionInvocation }
[
  [ AS ] correlation-Name
  [ (Simple-column-Name [ , Simple-column-Name]* ) ] ]
]


TableFunctionInvocation ::= TABLE ( function-name ( [ Value [, Value ]* ] ) )

The following extra rules apply when including a Table Function in the FROM clause:

Here is an example:


INSERT INTO employees
  SELECT s.*
    FROM TABLE (externalEmployees() ) s;

User-written ResultSets

At runtime, a Derby-style Table Function must resolve to a public static method which returns a ResultSet. That interface is very large. The returned ResultSet only needs to implement the following methods. Unimplemented methods can simply throw a SQLException.

Optimizer Support

By default, the Derby optimizer makes the following assumptions about a Table Function:

The user can override this optimizer behavior by giving the optimizer more information. Here's how to do this:

Please see Appendix C for advice on how to implement VTICosting.getEstimatedCostPerInstantiation().

See Appendix D for an example VTI complete with an implementation of org.apache.derby.vti.VTICosting.

Runtime Behavior

Derby does not call the getMetaData() method of the ResultSet returned by a Derby-style Table Function. Instead, Derby assumes that the ResultSet has the shape declared by the CREATE FUNCTION statement which defined it. Runtime SQLExceptions may arise if your ResultSet does not fulfill the getXXX() contract implied when you created the function--as described in Appendix A.

Type System

We introduce a new Derby type RowMultisetImpl. This is the return type of Table Functions. A RowMultisetImpl contains all of the column names in the returned table as well as their datatypes. For more detail on this new type, see Appendix B.

DatabaseMetaData

We change the JDBC database metadata as follows:

DatabaseMetaData method Changes
getFunctions() This metadata method returns a ResultSet describing user-defined functions. One row per function is returned. One column in this ResultSet, FUNCTION_TYPE, describes the function's return type. For Table Functions, this column must contain the value DatabaseMetaData.functionReturnsTable.
getFunctionColumns()

This metadata method returns a ResultSet describing the return values and parameters of user-defined functions. For Table Functions, this ResultSet must contain one row per parameter and one row per column in the returned table. In a row describing a column in the returned table, the COLUMN_TYPE column is set to DatabaseMetaData.functionColumnResult. and the ORDINAL_POSITION column is set to the position of the column in the returned table's tuple (starting at position 1). So, for example, for a Table Function with P parameters which returns a table that has R columns, this metadata method returns a ResultSet with P+R rows. which have the following values in the following distinguished colmns:

COLUMN_TYPE ORDINAL_POSITION
DatabaseMetaData.functionColumnIn, DatabaseMetaData.functionColumnOut, or DatabaseMetaData.functionColumnInOut 1 - P
DatabaseMetaData.functionColumnResult 1 - R

System Tables

Derby-style Table Functions store additional metadata in SYSALIASES.

SYSALIASES

For a user-created Function, there is a row in SYSALIASES. A good deal of metadata lives in the RoutineAliasInfo object which is serialized into the ALIASINFO column of that row. The RoutineAliasInfo object carries additional metadata for Derby-style Table Functions:

Public API

The org.apache.derby.vti.VTICosting and org.apache.derby.vti.VTIEnvironment interfaces have existed in the Derby codeline since the engine was open-sourced. However, they were not part of Derby's public API. As part of this feature work, we expose these interfaces in Derby's public API. We improve the javadoc for these interfaces so that they talk about Table Functions rather than VTIs. The user guides explain how to use these interfaces.

Upgrade

Soft Upgrade

After a soft-upgrade to Target Release, you see no changes introduced by this spec.

Hard Upgrade

After a hard-upgrade to Target Release, you see the following behavior:

Scenario Old behavior New behavior Customer needs to make these changes...
Creating Derby-style Table Functions. Could not do. Now you can create these. None.
Invoking Derby-style Table Functions in queries. Could not do. Now you can invoke these in the FROM clause. None.

Documentation

This new behavior requires changes to the user guides:

Release Notes

The Release Notes should mention this feature and should refer the reader to the relevant documentation in the Developer's Guide and the Reference Manual.

Appendix A: Invoked getXXX() Methods

getXXX() Methods Called for Declared SQL Types
Column Type Declared by CREATE FUNCTION getXXX() Method Called by Derby
BIGINT getLong()
BLOB getBlob()
CHAR getString()
CHAR FOR BIT DATA getBytes()
CLOB getClob()
DATE getDate()
DECIMAL getBigDecimal()
DOUBLE getDouble()
DOUBLE PRECISION getDouble()
FLOAT getDouble()
INTEGER getInt()
LONG VARCHAR getString()
LONG VARCHAR FOR BIT DATA getBytes()
NUMERIC getBigDecimal()
OTHER getObject()
REAL getFloat()
SMALLINT getShort()
TIME getTime()
TIMESTAMP getTimestamp()
VARCHAR getString()
VARCHAR FOR BIT DATA getBytes()
XML Not supported.

Appendix B: Type System Details

The Derby type system is a bit involved. This appendix describes the new RowMultisetImpl type returned by a Table Function.

The following wrapping then occurs in the Derby Type System:

RowMultiSetImpl   rmsi = new RowMultiSetImpl( ... );
TypeId  multisetTypeID = new TypeId( StoredFormatIds.ROW_MULTISET_ID, rmsi );

//
// This is the value returned by the getReturnType() method of
// the RoutineAliasInfo stored in the SYSALIASES.ALIASINFO
// column for the Table Function.
//
DataTypeDescriptor dtd = new DataTypeDescriptor( multisetTypeID, true );

Appendix C: VTICosting

The following formula describes how to estimate the value returned by VTICosting.getEstimatedCostPerInstantiation():

C = I * A

where

Optimizer Imprecision

We treat optimizer Imprecision as a constant across the runtime environment. The following formula describes it:

I = O / T

where

To estimate these values, turn on Derby statistics collection and run the following experiment several times, averaging the results:

For example:


MAXIMUMDISPLAYWIDTH 7000;

CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1);

select * from T;

values SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();

Actual Runtime Cost

The following formula explains how to compute the Actual runtime cost for the Table Function:

A = ( P * N ) + E

where

You may know that E is basically 0. If so, you can skip this step. Otherwise, to estimate E, turn on Derby statistics collection and run the following experiment several times, averaging the results:

To estimate P, turn on Derby statistics collection and run the following experiment several times, averaging the results:

Cost Function

Putting all of this together, the following formula describes the value returned by your Table Function's VTICosting.getEstimatedCostPerInstantiation() method. Variables are given in blue, constants in black, and results in red.

C = I * [ ( P * N ) + E ]

Appendix D: Sample VTI


package com.acme.hrSchema;

import java.io.Serializable;
import java.sql.*;

import org.apache.derby.vti.VTICosting;
import org.apache.derby.vti.VTIEnvironment;

/**
 * Sample Table Function for reading the employee table in an
 * external database. Here's how to use this Table Function:
 *
 * 1) Register the Table Function with Derby:
 *
 *    CREATE FUNCTION externalEmployees
 *    ()
 *    RETURNS TABLE
 *    (
 *      employeeId    INT,
 *      lastName       VARCHAR( 50 ),
 *      firstName      VARCHAR( 50 ),
 *      birthday         DATE
 *    )
 *    LANGUAGE JAVA
 *    PARAMETER STYLE DERBY_JDBC_RESULT_SET
 *    NO SQL
 *    EXTERNAL NAME 'com.acme.hrSchema.EmployeesTable.read'
 *
 *
 * 2) When you want to select from the Table Function, do the following
 *
 *    SELECT s.* FROM TABLE ( externalEmployees() ) s
 *
 *
 * 3) When you are done siphoning out the rows you need, release the
 *      connection to the external database:
 *
 *    EmployeeTable.close();
 *
 * Note that this is a clumsy way to manage connections. Among other
 * defects, it is not re-entrant. You will want to do something more
 * sophisticated for a production-hardened VTI, which
 * needs to manage connections to an external database
 * on behalf of a multi-user application.
 */

public class EmployeeTable implements VTICosting
{
    //////////////////////////////////////////////////////////////////
    //
    //  CONSTANTS
    //
    //////////////////////////////////////////////////////////////////

    private static  final   String  EXTERNAL_DRIVER = "org.gjt.mm.mysql.Driver";
    private static  final   String  TABLE_NAME = "hrSchema.EmployeeTable";

    //
    // Constants used by cost function
    //
    private static  final   double  I = 100.0;  // optimizer imprecision
    private static  final   double  P = 10.0;   // cost per row in milliseconds
    private static  final   double  E = 0.0;    // cost of instantiating the external ResultSet
    
    //
    // Keys into shared state stored in the VTIEnvironment variable.
    //
    private static  final   String  ROW_COUNT_KEY = "rowCountKey";
    
    //////////////////////////////////////////////////////////////////
    //
    //  STATE
    //
    //////////////////////////////////////////////////////////////////

    private static  Connection  _currentConnection;

    //////////////////////////////////////////////////////////////////
    //
    //  CONSTRUCTOR
    //
    //////////////////////////////////////////////////////////////////
    
    /**
     * Public no-arg constructor, needed for VTICosting.
     */
    public  EmployeeTable() {}

    //////////////////////////////////////////////////////////////////
    //
    //  FUNCTION REGISTERED WITH DERBY
    //
    //////////////////////////////////////////////////////////////////

    /**
     * This is the function which returns a result set for looping
     * through the external table. 
     */
    public  static  ResultSet   read()
        throws SQLException
    {
        Connection                  conn = getCurrentConnection();
        PreparedStatement   ps = conn.prepareStatement( "select * from " + TABLE_NAME );

        return ps.executeQuery();
    }
    
    //////////////////////////////////////////////////////////////////
    //
    //  VTICosting IMPLMENTATION
    //
    //////////////////////////////////////////////////////////////////

    public double getEstimatedRowCount( VTIEnvironment optimizerState ) throws SQLException
    {
        return getRowCount( optimizerState );
    }

    public double getEstimatedCostPerInstantiation( VTIEnvironment optimizerState ) throws SQLException
    {
        double      N = getRowCount( optimizerState );

        return I * ( ( P * N ) + E );
    }
    
    public boolean supportsMultipleInstantiations( VTIEnvironment optimizerState ) throws SQLException
    {
        return true;
    }

    //////////////////////////////////////////////////////////////////
    //
    //  OTHER PUBLIC BEHAVIOR
    //
    //////////////////////////////////////////////////////////////////
    
    /**
     * Release our connection to the external database. Call this method when you are
     * done siphoning out rows.
     */
    public  static  void    close()
        throws SQLException
    {
        if ( _currentConnection != null )
        {
            _currentConnection.close();
            _currentConnection = null;
        }
    }

    //////////////////////////////////////////////////////////////////
    //
    //  MINIONS
    //
    //////////////////////////////////////////////////////////////////

    private static Connection getCurrentConnection()
        throws SQLException
    {
        if ( _currentConnection == null )
        {
            try { Class.forName( EXTERNAL_DRIVER ); }
            catch (ClassNotFoundException e) { throw new SQLException( "Could not find class " + EXTERNAL_DRIVER ); }

            _currentConnection = DriverManager.getConnection( "jdbc:mysql://localhost/hr" );
        }
        
        return _currentConnection;
    }

    private double  getRowCount( VTIEnvironment optimizerState )
        throws SQLException
    {
        Double          estimatedRowCount = (Double) getSharedState( optimizerState, ROW_COUNT_KEY );
        
        if ( estimatedRowCount == null )
        {
            Connection              conn = getCurrentConnection();
            PreparedStatement   ps = conn.prepareStatement( "select count(*) from " + TABLE_NAME );
            ResultSet                   rs = ps.executeQuery();

            rs.next();
            estimatedRowCount = new Double( rs.getDouble( 1 ) );
            
            setSharedState( optimizerState, ROW_COUNT_KEY, estimatedRowCount );

            rs.close();
            ps.close();
        }

        return estimatedRowCount.doubleValue();
    }

    private Serializable  getSharedState( VTIEnvironment optimizerState, String key )
    {
        return (Serializable) optimizerState.getSharedState( key );
    }

    private void    setSharedState( VTIEnvironment optimizerState, String key, Serializable value )
    {
        optimizerState.setSharedState( key, value );
    }
    
}