![]() |
|
| Revision | Description | Date | Author |
| 3.0 | Clarifications and other changes, based on discussions with
Army Brown and Dan Debrunner.
|
May 18, 2007 | Rick Hillegas |
| 2.0 | Clarifications, based on discussions with Army Brown, Christian
d'Heureuse, and Dan Debrunner.
|
April 27, 2007 | Rick Hillegas |
| 1.0 | First cut, based on discussions with Dan Debrunner and Christian d'Heureuse. | April 11, 2007 | Rick Hillegas |
Here we describe Derby-style Table Functions, a feature which lets Derby treat user-written ResultSets like tables. Table Functions bolt a tabular api onto arbitrary data feeds. Through Table Functions, external data feeds can drive inserts and queries. Table Functions let people do the following:
Table Functions are described in the SQL Standard:
Derby-style Table Functions are a powerful and elegant blend of the SQL and JDBC standards:
This use of ResultSets is a Derby innovation not described by the SQL Standard. The SQL Standard describes a different convention for implementing the Java methods which materialize Table Functions. Those conventions are described in the SQL Standard (Part 13, sections 8.5 and 8.6) and are summarized on the Java Table Functions wiki page. A later release could build on this feature and implement those conventions too. That work, however, falls outside the scope of this spec.
For the remainder of this spec, we use the following terms:
We introduce new SQL syntax.
New CREATE FUNCTION Syntax
We introduce a new unreserved keyword, DERBY_JDBC_RESULT_SET. This describes a Derby-specific parameter style, which maps Table Functions to ResultSets.
We exhance the CREATE FUNCTION statement as follows. Changes are marked in red.
CREATE FUNCTION function-name ( [ FunctionParameter [, FunctionParameter] ] * ) RETURNS ReturnType [ FunctionElement ] * function-name ::= [ schemaName. ] SQL92Identifier FunctionParameter ::= [ parameter-Name ] DataType ReturnType ::= { DataType | TableType } TableType ::= TABLE ( ColumnElement [, ColumnElement ]* ) ColumnElement ::= ColumnName DataType ColumnName :: = SQL92Identifier FunctionElement ::= { | LANGUAGE { JAVA } | EXTERNAL NAME string | PARAMETER STYLE ParameterStyle | { NO SQL | CONTAINS SQL | READS SQL DATA } | { RETURNS NULL ON NULL INPUT | CALLED ON NULL INPUT } } ParameterStyle ::= { JAVA | DERBY_JDBC_RESULT_SET }The following extra rules apply:
- ParameterStyle - If the CREATE FUNCTION statement declares a Table Function, then ParameterStyle must be DERBY_JDBC_RESULT_SET. Otherwise, ParameterStyle must be JAVA.
- DataType - A column in a Table Function may be any Derby datatype except for XML. Support for XML can be added in a later release after Derby can map between SQL Standard XML types and JDBC SQLXML types.
Here is an example:
CREATE FUNCTION externalEmployees () RETURNS TABLE ( employeeId INT, lastName VARCHAR( 50 ), firstName VARCHAR( 50 ), birthday DATE ) LANGUAGE JAVA PARAMETER STYLE DERBY_JDBC_RESULT_SET NO SQL EXTERNAL NAME 'com.acme.hrSchema.EmployeesTable.read' ;Additional SELECT Syntax
We document SELECT statement syntax which allows Table Functions in the FROM list. This syntax is supported today for the builtin diagnostic VTIs and is shown in the Reference Manual in the section titled "SYSCS_DIAG diagnostic tables and functions". However, the 10.2 Reference Manual does not document this syntax in the section titled "FROM clause". Changes to the syntax documented under "FROM clause" are marked in red.
... FROM TableExpression [, TableExpression ] TableExpression ::= { TableOrViewExpression | JOIN operation } TableOrViewExpression ::= { table-Name | view-Name | TableFunctionInvocation } [ [ AS ] correlation-Name [ (Simple-column-Name [ , Simple-column-Name]* ) ] ] ] TableFunctionInvocation ::= TABLE ( function-name ( [ Value [, Value ]* ] ) )The following extra rules apply when including a Table Function in the FROM clause:
- Value - A Value is an expression which could appear as an argument in the invocation of one of the diagnostic VTI functions. This includes literals and ? parameters but may also include the return values of function calls as well as correlated references to columns in outer query blocks.
- correlation-Name - A correlation-name is required for the Table Function, although the AS keyword is optional. This is required by the SQL Standard, Part 2, section 7.6 (<table reference>).
Here is an example:
INSERT INTO employees SELECT s.* FROM TABLE (externalEmployees() ) s;
At runtime, a Derby-style Table Function must resolve to a public static method which returns a ResultSet. That interface is very large. The returned ResultSet only needs to implement the following methods. Unimplemented methods can simply throw a SQLException.
By default, the Derby optimizer makes the following assumptions about a Table Function:
The user can override this optimizer behavior by giving the optimizer more information. Here's how to do this:
Please see Appendix C for advice on how to implement VTICosting.getEstimatedCostPerInstantiation().
See Appendix D for an example VTI complete with an implementation of org.apache.derby.vti.VTICosting.
Derby does not call the getMetaData() method of the ResultSet returned by a Derby-style Table Function. Instead, Derby assumes that the ResultSet has the shape declared by the CREATE FUNCTION statement which defined it. Runtime SQLExceptions may arise if your ResultSet does not fulfill the getXXX() contract implied when you created the function--as described in Appendix A.
We introduce a new Derby type RowMultisetImpl. This is the return type of Table Functions. A RowMultisetImpl contains all of the column names in the returned table as well as their datatypes. For more detail on this new type, see Appendix B.
We change the JDBC database metadata as follows:
| DatabaseMetaData method | Changes | ||||||
| getFunctions() | This metadata method returns a ResultSet describing user-defined functions. One row per function is returned. One column in this ResultSet, FUNCTION_TYPE, describes the function's return type. For Table Functions, this column must contain the value DatabaseMetaData.functionReturnsTable. | ||||||
| getFunctionColumns() |
This metadata method returns a ResultSet describing the return values and parameters of user-defined functions. For Table Functions, this ResultSet must contain one row per parameter and one row per column in the returned table. In a row describing a column in the returned table, the COLUMN_TYPE column is set to DatabaseMetaData.functionColumnResult. and the ORDINAL_POSITION column is set to the position of the column in the returned table's tuple (starting at position 1). So, for example, for a Table Function with P parameters which returns a table that has R columns, this metadata method returns a ResultSet with P+R rows. which have the following values in the following distinguished colmns:
|
Derby-style Table Functions store additional metadata in SYSALIASES.
SYSALIASES
For a user-created Function, there is a row in SYSALIASES. A good deal of metadata lives in the RoutineAliasInfo object which is serialized into the ALIASINFO column of that row. The RoutineAliasInfo object carries additional metadata for Derby-style Table Functions:
- getParameterStyle() - This method returns a new parameter style, PS_DERBY_JDBC_RESULT_SET.
- getReturnType() - This is a RowMultisetImpl. Note that we do not document this new datatype in the Derby user guides. However, this datatype will be visible to users who select SYSALIASES.ALIASINFO.
The org.apache.derby.vti.VTICosting and org.apache.derby.vti.VTIEnvironment interfaces have existed in the Derby codeline since the engine was open-sourced. However, they were not part of Derby's public API. As part of this feature work, we expose these interfaces in Derby's public API. We improve the javadoc for these interfaces so that they talk about Table Functions rather than VTIs. The user guides explain how to use these interfaces.
Soft Upgrade
After a soft-upgrade to Target Release, you see no changes introduced by this spec.
Hard Upgrade
After a hard-upgrade to Target Release, you see the following behavior:
Scenario Old behavior New behavior Customer needs to make these changes... Creating Derby-style Table Functions. Could not do. Now you can create these. None. Invoking Derby-style Table Functions in queries. Could not do. Now you can invoke these in the FROM clause. None.
This new behavior requires changes to the user guides:
The Release Notes should mention this feature and should refer the reader to the relevant documentation in the Developer's Guide and the Reference Manual.
| Column Type Declared by CREATE FUNCTION | getXXX() Method Called by Derby |
|---|---|
| BIGINT | getLong() |
| BLOB | getBlob() |
| CHAR | getString() |
| CHAR FOR BIT DATA | getBytes() |
| CLOB | getClob() |
| DATE | getDate() |
| DECIMAL | getBigDecimal() |
| DOUBLE | getDouble() |
| DOUBLE PRECISION | getDouble() |
| FLOAT | getDouble() |
| INTEGER | getInt() |
| LONG VARCHAR | getString() |
| LONG VARCHAR FOR BIT DATA | getBytes() |
| NUMERIC | getBigDecimal() |
| OTHER | getObject() |
| REAL | getFloat() |
| SMALLINT | getShort() |
| TIME | getTime() |
| TIMESTAMP | getTimestamp() |
| VARCHAR | getString() |
| VARCHAR FOR BIT DATA | getBytes() |
| XML | Not supported. |
The Derby type system is a bit involved. This appendix describes the new RowMultisetImpl type returned by a Table Function.
ROW ( ColumnName DataType [, ColumnName DataType ]* ) MULTISET
The following wrapping then occurs in the Derby Type System:
RowMultiSetImpl rmsi = new RowMultiSetImpl( ... ); TypeId multisetTypeID = new TypeId( StoredFormatIds.ROW_MULTISET_ID, rmsi ); // // This is the value returned by the getReturnType() method of // the RoutineAliasInfo stored in the SYSALIASES.ALIASINFO // column for the Table Function. // DataTypeDescriptor dtd = new DataTypeDescriptor( multisetTypeID, true );
The following formula describes how to estimate the value returned by VTICosting.getEstimatedCostPerInstantiation():
C = I * A
where
Optimizer Imprecision
We treat optimizer Imprecision as a constant across the runtime environment. The following formula describes it:
I = O / Twhere
- O = The Optimizer's estimated cost for a plan.
- T = The Total runtime in milliseconds for the plan.
To estimate these values, turn on Derby statistics collection and run the following experiment several times, averaging the results:
- Select = Select all of the rows from a big table.
- Record = In the statistics output, look for the ResultSet which represents the table scan. That scan has a field labelled "optimizer estimated cost". That's O. Now look for the fields in that ResultSet's statistics labelled "constructor time", "open time", "next time", and "close time". Add up all of those fields. That total is T.
For example:
MAXIMUMDISPLAYWIDTH 7000; CALL SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1); CALL SYSCS_UTIL.SYSCS_SET_STATISTICS_TIMING(1); select * from T; values SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();Actual Runtime Cost
The following formula explains how to compute the Actual runtime cost for the Table Function:
A = ( P * N ) + Ewhere
- P = The runtime spent Per row (in milliseconds).
- N = The Number of rows in the Table Function.
- E = The time spent creating an Empty instance of the Table Function which has no rows in it. Usually, P * N dwarfs E. That is, the Table Function instantiation cost is very small compared to the actual cost of looping through the rows. However, for some Table Functions, E may be significant and may dominate the Table Function's cost when N is small.
You may know that E is basically 0. If so, you can skip this step. Otherwise, to estimate E, turn on Derby statistics collection and run the following experiment several times, averaging the results:
- Short-circuit = Short-circuit the next() method of the ResultSet returned by your Derby-style Table Function so that it returns false the first time it is called. This makes it appear that the ResultSet has no rows.
- Select = Select all of the rows from the Table Function.
- Record = In the statistics output, look for the VTIResultSet which represents the Table Function scan. Add up the values of the fields in that VTIResultSet's statistics labelled "constructor time", "open time", "next time", and "close time". That total is E.
To estimate P, turn on Derby statistics collection and run the following experiment several times, averaging the results:
- Select = Select all of the rows from the Table Function.
- Record = In the statistics output, look for the VTIResultSet which represents the Table Function scan. Add up the values of the fields in that VTIResultSet's statistics labelled "constructor time", "open time", "next time", and "close time". Subtract E from the result. Now divide by the value of the field "Rows seen". The result is P.
Cost Function
Putting all of this together, the following formula describes the value returned by your Table Function's VTICosting.getEstimatedCostPerInstantiation() method. Variables are given in blue, constants in black, and results in red.
C = I * [ ( P * N ) + E ]
package com.acme.hrSchema;
import java.io.Serializable;
import java.sql.*;
import org.apache.derby.vti.VTICosting;
import org.apache.derby.vti.VTIEnvironment;
/**
* Sample Table Function for reading the employee table in an
* external database. Here's how to use this Table Function:
*
* 1) Register the Table Function with Derby:
*
* CREATE FUNCTION externalEmployees
* ()
* RETURNS TABLE
* (
* employeeId INT,
* lastName VARCHAR( 50 ),
* firstName VARCHAR( 50 ),
* birthday DATE
* )
* LANGUAGE JAVA
* PARAMETER STYLE DERBY_JDBC_RESULT_SET
* NO SQL
* EXTERNAL NAME 'com.acme.hrSchema.EmployeesTable.read'
*
*
* 2) When you want to select from the Table Function, do the following
*
* SELECT s.* FROM TABLE ( externalEmployees() ) s
*
*
* 3) When you are done siphoning out the rows you need, release the
* connection to the external database:
*
* EmployeeTable.close();
*
* Note that this is a clumsy way to manage connections. Among other
* defects, it is not re-entrant. You will want to do something more
* sophisticated for a production-hardened VTI, which
* needs to manage connections to an external database
* on behalf of a multi-user application.
*/
public class EmployeeTable implements VTICosting
{
//////////////////////////////////////////////////////////////////
//
// CONSTANTS
//
//////////////////////////////////////////////////////////////////
private static final String EXTERNAL_DRIVER = "org.gjt.mm.mysql.Driver";
private static final String TABLE_NAME = "hrSchema.EmployeeTable";
//
// Constants used by cost function
//
private static final double I = 100.0; // optimizer imprecision
private static final double P = 10.0; // cost per row in milliseconds
private static final double E = 0.0; // cost of instantiating the external ResultSet
//
// Keys into shared state stored in the VTIEnvironment variable.
//
private static final String ROW_COUNT_KEY = "rowCountKey";
//////////////////////////////////////////////////////////////////
//
// STATE
//
//////////////////////////////////////////////////////////////////
private static Connection _currentConnection;
//////////////////////////////////////////////////////////////////
//
// CONSTRUCTOR
//
//////////////////////////////////////////////////////////////////
/**
* Public no-arg constructor, needed for VTICosting.
*/
public EmployeeTable() {}
//////////////////////////////////////////////////////////////////
//
// FUNCTION REGISTERED WITH DERBY
//
//////////////////////////////////////////////////////////////////
/**
* This is the function which returns a result set for looping
* through the external table.
*/
public static ResultSet read()
throws SQLException
{
Connection conn = getCurrentConnection();
PreparedStatement ps = conn.prepareStatement( "select * from " + TABLE_NAME );
return ps.executeQuery();
}
//////////////////////////////////////////////////////////////////
//
// VTICosting IMPLMENTATION
//
//////////////////////////////////////////////////////////////////
public double getEstimatedRowCount( VTIEnvironment optimizerState ) throws SQLException
{
return getRowCount( optimizerState );
}
public double getEstimatedCostPerInstantiation( VTIEnvironment optimizerState ) throws SQLException
{
double N = getRowCount( optimizerState );
return I * ( ( P * N ) + E );
}
public boolean supportsMultipleInstantiations( VTIEnvironment optimizerState ) throws SQLException
{
return true;
}
//////////////////////////////////////////////////////////////////
//
// OTHER PUBLIC BEHAVIOR
//
//////////////////////////////////////////////////////////////////
/**
* Release our connection to the external database. Call this method when you are
* done siphoning out rows.
*/
public static void close()
throws SQLException
{
if ( _currentConnection != null )
{
_currentConnection.close();
_currentConnection = null;
}
}
//////////////////////////////////////////////////////////////////
//
// MINIONS
//
//////////////////////////////////////////////////////////////////
private static Connection getCurrentConnection()
throws SQLException
{
if ( _currentConnection == null )
{
try { Class.forName( EXTERNAL_DRIVER ); }
catch (ClassNotFoundException e) { throw new SQLException( "Could not find class " + EXTERNAL_DRIVER ); }
_currentConnection = DriverManager.getConnection( "jdbc:mysql://localhost/hr" );
}
return _currentConnection;
}
private double getRowCount( VTIEnvironment optimizerState )
throws SQLException
{
Double estimatedRowCount = (Double) getSharedState( optimizerState, ROW_COUNT_KEY );
if ( estimatedRowCount == null )
{
Connection conn = getCurrentConnection();
PreparedStatement ps = conn.prepareStatement( "select count(*) from " + TABLE_NAME );
ResultSet rs = ps.executeQuery();
rs.next();
estimatedRowCount = new Double( rs.getDouble( 1 ) );
setSharedState( optimizerState, ROW_COUNT_KEY, estimatedRowCount );
rs.close();
ps.close();
}
return estimatedRowCount.doubleValue();
}
private Serializable getSharedState( VTIEnvironment optimizerState, String key )
{
return (Serializable) optimizerState.getSharedState( key );
}
private void setSharedState( VTIEnvironment optimizerState, String key, Serializable value )
{
optimizerState.setSharedState( key, value );
}
}