In DP4 the word join is typically used to denote the action of fetching a parent record whose primary key is available from corresponding foreign key fields in a child table. This type of join is very easy using the built in processing in QAB and other DP4 utilities such as ReportWriter. The Index analysis program (MAKELINK) analyses the data dictionary and constructs a table called @PC_RECORD (short for parent child record) which describes all the possible direct (and indirect) joins between tables. Except in the case where there is more than one instance of a foreign key in the child table (possible in DP4 through the role concept) a join is typically thought of as the unique parent record that corresponds to the child. In the DP4 API a program joins a table by copying the key fields from the child record that has already been read into the key fields of the parent and calling a DP4 fetch function to read the EQUAL record on the main index.
Typically we might do something like this to retrieve an order and the associated customer record:
In SQL a join is any kind of SELECT statement where columns (fields in DP4 terminology) are fetched from more than one table.
The fundamental SQL join is the cross or Cartesian product of the tables, consisting of all possible pairs of rows selecting one row from the first table , and one row from the second table . Any more useful join is expressed by restricting the rows returned through the use of a WHERE clause, or an equivalent construct (of which there are several in the SQL2 standard). In SQL the traditional way of expressing the DP4 code above would be something like:
In order to return the single order and customer record the SQL engine notionally constructs a temporary table consisting of all possible pairs of orders and customers and then filters it leaving only the one row of interest. Of course you have to hope that the SQL engine does not work like this in practice. If you have a database of 10 customers and 50 orders it won't matter too much. If you have 10000 customers and 50000 orders then a SQL engine that worked like this would take all week to find one record.
A SQL engine will contain various optimisations that enable it to recognise certain kinds of query and execute them efficiently by limiting the records actually read from the database in some way.
It is important to realise that any SQL engine will have only a limited range of optimisations. When the SQL engine is unable to optimise a query, it really will construct the Cartesian product of the all the tables used in the query, and then filter it. Therefore you should satisfy yourself that any queries you construct are capable of being executed within a reasonable time on a database of the intended size, and not a "noddy" development database with just a few items of test data.
In SQL a join can be constructed using any kind of valid expression. Thus you could equally well have:
This obviously fetches the customer who placed the order number
10000 less than the one you are thinking of now.
More interestingly, though not especially useful is this
example:
This would retrieve the list of all customers with a customer number less than the customer for this order.
In SQL the parent child concept, which is central to DP4, is almost invisible.
This query is syntactically identical to the first example. However, since presumably there can be many order lines in a single order, the query may return many rows.
DP4 also supports the notion of an indirect join . For example the CUSTOMER table might well have an ACCOUNT_MANAGER field which was a foreign key into an EMPLOYEE table. A program that takes orders might need to check the customer's credit rating. If the customer has a poor payment history it might want to look up the account manager and automatically send a memo to him, requesting authorisation of the order. The DP4 program has to add the following code:
Note that the program cannot retrieve the "correct" employee
details until it has first read the appropriate customer
record.
The SQL for this would be something like:
(The details would vary according to how the ACCOUNT_MANAGER foreign key was named. Only DP4 would support CUSTOMER.ACCOUNT_MANAGER.EMP_NR as other databases lack roles)
Consider our first example again (fetching an order and its associated customer record). Up until now we have not considered the possibility that an order record might exist for which the customer record was missing for some reason. In DP4 this situation is called join failure . It is usually up to the application to decide how to handle this situation. Note that the DP4 program will manage to retrieve the order even though the customer does not exist.
In SQL the situation is a little different. Recall that the result of a query is a subset of the Cartesian product of the rows. Now if there is no customer with a particular customer number, any query involving selecting a customer with a customer number equal to that number will necessarily return 0 rows. This could potentially cause confusion - a naive programmer might jump to the conclusion that when the
query fails it is because there is no order with that number.
One potential solution is to issue two SQL queries instead of one query with a join. However this is likely to be inefficient and if the join is complex then it may not be an option. Because of this the notion of an Outer join was added. Roughly speaking, in an outer join if there is no record that satisfies the condition on the record an imaginary "null" record is created, and this record is deemed to satisfy the condition involving that record. "Regular" joins are therefore sometimes known as Inner joins .
The notion of outer joins was not part of SQL originally. As a result outer joins have been implemented in different ways by different database products. For example in MS SQL Server outer joins are apparently specified by using a syntax using *= instead of = in where conditions. In Oracle a completely different syntax using (+) was invented. The SQL2 standard introduced many extensions allowing the specification of joins of various types. Unfortunately this standard is far from being universal as yet - Oracle 7 does not understand SQL2 joins for example, and Crystal reports does not use SQL2 syntax to specify outer joins. The SQL2 syntax has the merit of allowing multiple outer joins to be specified in a non-ambiguous way, which is not the case with the other syntaxes.
Before going any further it would be best to explain the concept of an outer join more accurately. In the example we want our query to return an order record, whether or not the corresponding customer record exists. The Oracle and Crystal reports syntax for doing this is as follows:
The (+) directs Oracle, (or other databases that understand this syntax) to return a NULL row when there is no customer with the customer number matching the customer number in the order. All the fields in the customer record returned in this case will be NULL. (I am not quite sure what the value in the CUST_NR field itself would or should be. In the DP4 implementation this field would also be NULL, which is not perhaps ideal). It should be noted that the field the (+) follows is significant. It establishes which table has imaginary null records added. I should perhaps stress that these imaginary nulls are only hovering in the wings to be called upon when there is no other way of satisfying the condition of the join. If this were not the case the output of the query would be peppered with unwanted null records.
If we had a more general query (missing out the condition on the order number) we would end up with a list of all orders on the database, whether or not the corresponding customer record existed. Another query we might like to issue is one that lists the most recent order for any customer.
This query would miss out customers who had not placed an order yet. In order to get them included as well we would specify an outer join. (The GROUP BY clause is needed otherwise the query would only return the customer who had placed the most recent order of all)
Note that now the (+) is appearing after the ORDER field, because we want imaginary nulls in the ORDER table.
This query could equally well be written:
In SQL2, rather than using a WHERE clause, joins can be specified using various join operators and a USING or ON clause. The ON clause contains the part of the condition that is concerned with limiting which rows of the two tables being joined correspond.
Using SQL2 syntax the last two queries would probably be expressed as:
However the queries can equally well be expressed the other way around:
These queries would be bound to return the same result set, though possibly in a different order. (This is an example of where optimisation of queries can cause trouble - in DP4 the "natural" form of the queries is more likely to be executed more optimally, than back to front equivalents.) In all four cases this syntax joins the tables using the inner join, so will not list the order if the customer is missing, or the customers without orders. If desired the word INNER can precede the word JOIN. (Probably this is rarely if ever done, though doubtless pedantic reference books will recommend programmers to use it "to avoid ambiguity")
The first two cases can be converted into outer joins by using the word OUTER:
However, if we consider the alternate form of these queries it should become obvious that this syntax is potentially ambiguous. In SQL2 there are both LEFT and RIGHT outer joins.
FROM ORDER LEFT OUTER JOIN CUSTOMER
implies
a join which adds imaginary nulls to the CUSTOMER table.FROM ORDER RIGHT OUTER JOIN CUSTOMER
implies a join which adds imaginary nulls to the ORDER table.In some sense the table on the left of a LEFT
outer
join is the "main" table as is the table on the right of a
RIGHT
outer join. Normally the SQL engine will want to
discover records in that table first before attempting to discover
which if any records in the other table match.
The notions of left and right outer join are essentially completely equivalent. However, there are probably cases where one or the other (usually LEFT) is more natural than the other.
If a SQL2 join expression specifies the use of LEFT
or RIGHT
then the word OUTER
is optional
(and superfluous). If OUTER
is specified on its own it
implies a left outer join.
It should be noted that using the SQL2 syntax a left outer join will most naturally correspond to an Oracle statement where the (+) appeared on the right hand side of a comparison.
In DP4 SQL you should refrain from using right outer joins where possible. Internally the DP4 SQL engine generates right outer joins by interchanging the two tables and using a left join instead. In a few cases join expressions involving nested joins cannot be converted properly resulting in an internal error from the DP4 SQL engine. This will be fixed when possible if it becomes necessary.
In ODBC, outer joins are considered to be an extension, and are implemented using a so called escape syntax.
becomes instead:
In the ODBC version of outer joins I believe that the word OUTER is mandatory in theory. ODBC drivers for databases such as Oracle are likely to mangle this syntax into the form expected by the database's native SQL support. This is a bit of a joker in the pack, as some drivers generate completely invalid SQL as a result. For example one early driver generated SELECT statements with two WHERE clauses if both an OUTER and an ordinary join were used.
If we return to our example where we also joined the account
manager record from the employee table you will see that a nested
outer join may be necessary. In this case the FROM
part of the query using inner joins would be something like:
Our interpretation of the syntax of nested joins is as follows:
All join operators are binary operators between two tables, the result of which returns a new table containing all the fields from both tables for the purposes of nesting joins. (Actually, the ON or USING clause may mean that JOIN is actually the first operator symbol in a ternary operator).
In an unparenthesised join expression involving multiple joins the joins must be constructed in is determined by the position of ON or USING CLAUSES: the last two tables encountered are joined when an ON or USING clause is encountered. For union or cross joins there is no ON or USING clause, so the tables on the left and right of these operators can be joined immediately. Thus A LEFT JOIN B LEFT JOIN C ON X ON Y, must first joins B and C and then joins the result with A, whereas A LEFT JOIN B ON X LEFT JOIN C ON Y first joins A and B, and then joins the result with C.
In the DP4 SQL engine nested JOIN operations should be parenthesised. This is possibly not required by SQL2 syntax, though I could find no examples in my book which did not parenthesise them. Currently not all unparenthesised nested joins are compiled.
However inner,union or cross joins are combined with each other or a single outer join the end result will be the same. Where multiple OUTER joins are used this may not be the case since in a nested JOIN expression different ways of expressing the same join can yield differing rows with imaginary null records.