Wrong records returned when fetching records using DP4 networking. - K4000026 - Revised 24 Apr 2002

Click here for new information about this problem

Various related bugs in the DP4 networking software have recently come to light. The first is long standing bug, but due to changes elsewhere the severity of this problem has increased in release 4.616 and again in 4.617. The problem will most likely manifest itself in 4.616/7 as an Equal and Onward/Backward search in QAB returning the wrong record. The second problem was introduced in 4.521/617.

Problem 1

The underlying cause of the first problem is that the "block fetch optimisation" in AUXDISTR or TCPW (it is implemented by either one or the other depending on which DP4 components are loaded) does not handle fetch(EQUAL|NEXT) or fetch(EQUAL|PREV) correctly. This type of search should return an equal record if there is one or the next or previous otherwise. The "block fetch" optimisation ignores the EQUAL part of the flag, so under some circumstances will return the wrong record. In fact only fairly unusual programs will suffer from this problem, as typically the EQUAL search would only be done the first time through a loop when the block fetch cache is empty. However, a program that backtracks or repeats an operation can cause this problem to arise.

EQUAL|NEXT and EQUAL|PREV searches are used quite frequently in core DP4 utilities, including PROGCOMP,PROGBIND and MAKELINK, so this bug might cause problems where core DP4 utilities are run over a network.

The problem is more severe in recent releases for two reasons:

This problem is likely to manifest itself only very intermittently as it is dependent on both the state of the database, and the previous activity on the network. It will therefore be advisable to upgrade TCPW or AUXDISTR to guard against this problem causing database consistency errors. An alternative is to use the -noblock command tail on AUXDISTR or TCPW to disable this optimisation. However disabling the optimisation may adversely impact performance of some applications.

Problem 2

This bug was introduced in Dec 1999 in another fix to the block fetch optimisation. In order to correctly maintain the cursor (or find list) position for access to a table, the network requester sometimes generates additional calls to fetch records. (Prior to 4.617 this was not done, so that the block fetch optimisation was potentially unsafe where both primary and non-unique secondary indexes were used on the same table in the one program.) When doing these fetches the 4.617 network requester, and also AUXDISTR, pass an incorrect, and possibly invalid value into the database manager for the number system. They pass the database generation number instead of the application number system.

26 January 2001 This fix has had to be reissued since as originally issued a problem similar to problem 2 was not in fact fixed: as well as passing an invalid number system, an incorrect code for the fetch function was also passed to the database manager in some cases, which could cause spurious a system error 23 when AUXDISTR is use. Since the wrong fetch function was being called, problem 1 might also still occur in some situations.

22 February 2001 It has become apparent that the network block fetch optimisation could still return wrong records in certain circumstances, especially when a secondary index was being used to read records: the additional calls to fetch described for problem 2 were not made in all cases where they were necessary. In particular scrolling through records one at a time on a call to pick_record() could cause a problem. (This would only be seen in C programs as QAB performs its own more efficient optimisations in this area). Additional calls to fetch are required whenever the last record in the network requesters cache is not the last record returned to the application, regardless of whether a primary or secondary index is in use, since the network requester cannot guarantee that the next fetch on this table will not use a secondary index for the next read on this table.

This code has now been thoroughly reworked, as in any case the optimisation was not as effective as it should have been in these circumstances. Because of the tendency of pick_record() to backtrack one record when scrolling through records one at a time, typically records were only fetched in blocks of two. Records will now be fetched in gradually larger blocks (up to a limit of 100 records at once), the longer the user scrolls through the list. The number of databases across the network will be roughly half what it would be otherwise as a result (because the fetch functions called by pick_record() typically also make one call to rec_fetch_main(EQUAL) per line scrolled and this cannot be eliminated by the network requester.)

21 May 2001Unfortunately the 22 Feb rework still was not correct, as when accessing variable length base dictionary tables a protection violation could occur in some circumstances. In particular running BROWSER or MAPEDIT with AUXDISTR loaded could cause AUXDISTR to crash. Although not yet observed with the network requesters only, the same problem was potentially present in them.

07 Jun 2001Even after all the above fixes, there was still a problem if programs updated records in a table being fetched on a non unique secondary index, as the additional fetch that is required was not being done in this case.

24 April 2002Almost unbelievably there were STILL at least three bugs left in this small bit of code:

-noblock

In the light of the problems in this area, those of a nervous disposition may understandably wish to dispense with the network block fetch optimisation altogether. You can do this by specifying -noblock on the command line to the network requester, or, when auxdistr is in use, on that program instead. Please refer to the new DP4 Network Resilience Developers Manual for further information. If you would like an estimate of the maximum possible benefit you would obtain by using this optimisation, then please refer to e4000024.

DP4 Products/Versions Affected

4.5xx,4.6xx. All network client software is affected (i.e. NTBx IPXx and TCPx), and also the DP4 4680/4690 Basic interface.

Where the version affected is given as 4.5xx or 4.6xx, all versions of DP4 issued prior to the date of the fix are potentially affected. Where a specific version number is given the problem was introduced by that release and prior releases are unaffected. If a patch release number is also specified (in parentheses) , the fault was introduced at that specific patch level.

Downloads

Please refer to k4000045.htm for downloads and further information.