Effect of Networking on Commit Failures

Network failures

This section applies to Local Database Resilience and Multiple Resilience Configurations Only

If a commit failure occurs due to network failures encountered by the DP4 networking software and the resilience mode means that a network error is not generated, then the global variable fail_code will be given the value COMMS_FAILURE.

Additional Commit Failures with AUXDISTR

This section applies to Multiple Resilience Configurations Only

An unexpected feature may be noticed when running programs in a multiple resilience configuration. The effect is that commit failures may occur at times when they would not have occurred with a single server. Consider the following scenario:

  1. User 1 reads record R, modifies it, posts it back and commits
  2. User 1 reads another record
  3. User 2 reads, modifies and posts back the same record, and commits
  4. User 1 reads, modifies and posts back the record

In configurations without AUXDISTR running on the client there is no commit failure, but with AUXDISTR user 1 receives a commit failure in transaction D. The reason is as follows. The rule for commit failures is:

In the example above transaction D is regarded as starting after the fetch in stage B, and the record R has of course been modified by user 2 during this time.

If there is only one server, the server recognises that the record being posted back in D was not read until after the end of transaction C, so it does not enforce the rule so strictly. But with AUXDISTR loaded the time at which the record was read cannot be determined, because AUXDISTR zeros the timestamps of all records read in case they are being read from one server and posted to another.

The remedy is to include a spurious database update in the program being executed by user 1, which does not alter any data on the database but simply marks the start of the next transaction.

Future versions of DP4 may cure this problem, as the current behaviour can be very annoying. Unfortunately fixing it is not likely to be easy, as unless the timestamp is set to zero there is a slight possibility of not generating a contention when one should be generated. In fact the behaviour may be marginally worse currently than with very old releases of AUXDISTR, as they passed some calls to fetch() directly to the database manager, and did not zero the timestamp.