SQL Statement Execution

The purpose of this part of the guide is to provide some information on how the DP4 SQL Engine executes SQL statements, so that developers have the best chance of writing statements in the most efficient form. You must not develop your system in such a way as to rely on the DP4 SQL Engine executing statements in a particular way: wherever possible we will seek to improve the SQL Engine by adding new optimisations, which may mean the database is accessed in a different way.

The code of the SQL Engine is very complex. When a particular statement is being compiled the Engine may transform the statement in many ways to reduce the number of accesses to the database to the minimum and to minimise the amount of extra work it has to undertake, for example to sort records. These transformations take the form of scoring the various parts of the statement in various ways, to determine which to execute first. It is very difficult to explain these transformations in simple terms, especially for the SELECT statement. In case of doubt use the AUXDEBUG ADC to discover exactly how a statement is executed.

The following explanations mention various flags from time to time:


Before considering the SELECT statement the other statements are briefly considered.

INSERT

An INSERT statement will be executed by a call to rec_fectch_main(EQUAL). If there is a matching record, then the INSERT will fail, unless the table is auto-increment. If a table is auto-increment any call to INSERT will result in a call to rec_autoinc() followed by a call to rec_post_with_flags() specifying either CHECK_PARENT, if omit_checks is not set, or NO_CHECK if it is.

If the go faster flag is set then the record is not fetched initially, and the INSERT may erroneously update an existing record. Therefore,when go faster is set, INSERT statements are only safe on auto-increment tables .

UPDATE

For an explanation of how UPDATE reads the records it will update, see the explanation of the WHERE clause for the SELECT statement.

The updating done by UPDATE statements is executed by a call to rec_post_with_flags specifying the UPDATE_ONLY flag. This will cause a commit failure at the next COMMIT WORK if the record does not exist. If the omit_checks flag is not set the CHECK_PARENT and CHECK_CHANGE flags are also specified.

If an UPDATE statement changes the primary key of the record, it works differently:

  1. First a check is done to see that there is currently no record with this primary key. If there is then the UPDATE fails
  2. Next the existing record is deleted, specifying flags to rec_kill as for a DELETE statement.
  3. The new record is posted, as though for an update, but omitting the UPDATE_ONLY flag.

DELETE

For an explanation of how DELETE reads the records it will update, see the explanation of the WHERE clause for the SELECT statement.

When deleting records the flags are specified as follows:

SELECT

The processing of the SELECT statement can be extremely complex, especially where there are several joins. The goal of the SQL Engine is to produce the correct result set for a query with the least amount of database access through query optimisation. In practice this goal is not always currently achieved, as there are some optimisations which could be performed but which are not. Queries more complicated than the optimiser can deal with may well be executed in a fairly inefficient manner.

WHERE Clause Optimisation

The remarks in this section also apply to ON clauses in SQL2 joins. ON clauses are treated as though they were ANDed with the WHERE clause.

The optimisations performed by DP4 SQL are mainly concerned with identifying the best index to use for reading a table, and reading as few records as possible from a table. For any table in a query the whole table will be read unless an index can be found with N keys for which keys 1 through M, 1 ≤ M ≤ N, there is a condition on the keyfield of the form fieldname=expression or expression=value (where expression is an arbitrary expression that evaluates to a result of the apppropriate type (perhaps even a subquery). What is more there must not be any OR which would allow these fields to take other values. It is much better to issue two queries without an OR than to this query:

select * from orderheader where ordernumber=1 or ordernumber=10000

An alternative, which is optimised, would be to use a UNION of two subqueries.

Important possible classes of optimisation are not performed. For example there is no optimisation which uses start and stop points for searching where BETWEEN, or IN clauses are used, nor of LIKE or MAX searches. Such optimisations will be added as time, resources and general usefulness permit. However many potential optimisations are extremely complex in the general case. Compared with earlier releases the current version of DP4 SQL will optimise many computer generated or modified queries better. Some software has a habit of fully parenthesising all expressions (or predicates to use the official jargon), and prior to the release 4.521/4.617 DP4 SQL gave up trying to optimise when it encountered such parentheses.

To clarify, here are some examples:

Where there are multiple optimisable predicates in the WHERE clause, and there are several candidate indexes, the SQL Engine will pick an index according to one of various rules.

ORDER BY and GROUP BY clauses can both affect the index used for queries, but only if the compilation of the WHERE clause had not already caused an index to be selected.

The use of DISTINCT on a column may also require the SQL Engine to sort data, and according to internal documenation also affects the choice of index. However, this does not appear to be the case currently.

Optimisation of Joins

So far we have considered only queries on a single table. A very important part of optimisation is concerned with optimising the number of fetches required to construct the result set of joins. You may find it helpful to refer to SQL Joins, which is adapted from information previously published on the DP4 User group. This explains how joins are constructed, and introduces some of the issues involved in optimising join construction.

The basic example of a join is the following:

select ordernumber,line from orderheader, orderline where order_header.ordernumber=order_line.ordernumber

which can also be written as:

select ordernumber,line from orderheader join orderline on order_header.ordernumber=order_line.ordernumber

There are two possible efficient schemes for performing this query:

Depending on other aspects of the SELECT statement the SQL Engine may use either of these strategies. In fact for the query as written here the SQL Engine will use the first strategy. If the order of the table names was interchanged it would use the second strategy.

The choice of strategy may be influenced by the type of join used, and by the WHERE clause. This will be illustrated by some examples.

Where more than two tables are read in a single select statement the current SQL engine can occasionally make very poor decisions about the best order to read the tables in. From version 4.523 you can control the order tables are read in, and therefore force the query to be executed in a more efficient way.

Strategies for writing efficient SQL

Some implementations of SQL allow a programmer to supply hints to the SQL Engine as to how a query should be executed. It is reasonable to do this, because as the relatively simple example above shows there is sometimes no absolutely right way to execute a query. In many cases the best strategy will be different depending on the relative number of records in the tables concerned. The DP4 SQL Engine did not support hints until release 4.523. The hint syntax used by DP4 may affect portability of your SQL to some extent - it will cause problems if your SQL engine does not accept C style comments (an extension invented by Oracle). Although SQL has some powerful features, it also has many glaring gaps, of which this is one of the most prominent (another being the impossibility of simply fetching one record in a particular direction relative to one you know about already).

The most important thing to do is to keep statements simple. In many cases there is no particular reason to generate a hugely complex query. If you are accessing the DP4 SQL Engine from a C/C++ program, or indeed any sufficiently powerful language, using either ODBC or pipes to SQLEXE, then you can probably write your own code to combine results from several simpler queries in an appropriate manner.

Other than this, the only real influence you have over the execution of the statement is the order tables are listed in. The first listed table is the least likely to be read multiple times. Therefore you should put the table for which reading repeatedly would be most inefficient first in the list of tables and join other tables from that one, and whenever possible ensure that tabels are listed in an order which will allow efficient use of indexes. If you use the /*DP4SQL O0*/ compiler hint mentioned in the next section the SQL Engine will not change the order of your tables at all, and you can experiment with ordering the tables in different ways to get the best performance.

The order of predicates in a WHERE clause, and the order of expressions on either side of a comparison has no effect on the efficiency of your SQL in DP4.

Compiler Hints

From version 4.523 you can affect the optimisation behaviour of the SQL Engine to some extent, by using a compiler hint that takes the form of a C style comment. (It might be considered better to use an ANSI or ODBC escape syntax, but the comment syntax has the merit that it can be placed anywhere in the SQL statement and is syntactically independent of it). The compiler hint takes the following form:

/* DP4SQL optionvalue... */

The value will always be a single digit. Currently the only options are O to specify how to reorder the tables in a multi table SELECT, and I to specify how to choose the indexes to be used. For example the comment /* DP4SQL O0*/ will force the SQL Engine to access tables in the exact order specified in the FROM clause of a SELECT. You can specify a default value for the O and I options in the [SQLLIB] section of the DP4 Configuration file. Specify a value for the O option by adding a line of the form table_sort=n. Specify a value for the I option by adding a line of the form table_index=n.

The meanings of the various values for the options are given in the table below:

Option Value Meaning
O 0 The tables are accessed in exactly the order specified in the FROM clause
O 1 The tables are reordered so that where possible tables which are used for grouped or higher precedence sorted fields are accessed earliest. It is not possible to reorder tables where one or other is outer joined.
O 2 The tables are reordered so that tables which are mentioned most often in the WHERE clause are accessed first. Tables mentioned equally often are sorted so that where possible (not outer joined) tables which are used for grouped or higher precedence sorted fields are accessed earliest.
O 3 The tables are reordered so that where possible child tables are accessed before parent tables that are mentioned in a condition, and otherwise those which are mentioned most often in the WHERE clause are accessed first. Tables mentioned equally often are sorted so that where possible tables which are used for grouped or sorted fields are accessed earliest.
I 0 All tables are accessed using the prime index
I 1 Tables are accessed using the first index where the total number of keyfields minus the number of leading keys specified by a predicate is as small as possible.
I 2 Tables are accessed using the prime index if all its keys are specified by predicates, or the first index encountered that has the largest leading subset of its keys specified by a predicate.

There should only be at most one hint for each option in a statement. Each hint option applies on a global level to the whole statement - you cannot use multiple hints to force different behaviours on different tables.