Troubleshooting

When DP4 networking is in use applications may not be able to report errors until they have managed to connect to the network server. When there is a problem the application may simply fail silently (it will usually return an exit code of 5 when this happens). Obviously this can make problem investigation a little tricky. (This should not happen with DP4 Enterprise releases.)

If a call to the DP4 system fails because of a network problem you will normally encounter a FAIL 7 error message. Some additional information on the various additional error codes associated with this message is given here.

If you have trouble making or keeping connections with DP4 networking there are diagnostic facilities that may be of use. However these diagnostics are primarily designed for use by Itim Technology Solutions, and you may have trouble understanding them. You can find general information on enabling DP4 diagnostics in the Guide to DP4 configuration here.

On Windows (including Windows CE), and, from 4.525, on Unix and Linux, the diagnostics are (at least optionally) written to the DP4 error log program. The easiest way to turn on DP4 error logging, is to stop the DP4 service (if it is started), and reload it like this:
srvw32 -start -load_errlog -debug_echo
On Windows CE you use srvwce, and on Unix and Linux dbdaemon, instead of srvw32.

You can turn on error logging by default by setting the entry for load_errlog in the [system] section of the DP4 configuration file to 1. Error logs are usually written to the file debug.out (the name is also a DP4 configuration file entry) which will be located in the current directory for the DP4 service (normally the directory where it is installed). On Windows NT/2000 you must enable the DP4 service to interact with the desktop for the error log to output to the screen. Alternatively you can use the -load option in place of −start when you run srvw32.

If you are using 4.523 on Unix or Linux the diagnostics are written to various files and cannot be seen on screen unless you also specify −echo on the command lines for the DP4 network programs. See DP4 Networking Notes for Unix and Linux (4.523) for more details.

There are three levels of diagnostics, enabled by options on the command line to the network requester and manager:

  1. −debug_error , available from 4.520 onwards, causes diagnostics to be written only when there is an error condition - either a bad return code from the operating system, or a system error or fail error from DP4. This option can safely be left turned on all the time, and may help to diagnose intermittent problems.
  2. −debug_connect , available from 4.621 onwards, includes the effect of −debug_error (causing diagnostics to be written when there is an error condition), but includes some additional diagnostics related to the making and breaking of connections between the requesters and servers, which are produced whether or not there is an error condition. This option can also safely be left turned on all the time, and may help to diagnose intermittent problems.
  3. −debug turns on diagnostics for every stage of every operation in the network manager or requester. It should normally only be used when you have trouble getting or maintaining a connection at all. This option has a severe impact on performance, and should never be used in a production environment.

Trouble shooting TCP/IP - things to try

−nonames Option

If you have trouble making a connection, it is worth experimenting with the −nonames option on tcpmgr. By default tcpmgr attempts to discover the hostname for all incoming connections. Occasionally, particularly in a network contain machines running a mixture of operating systems (especially non Windows operating systems), this call can take a VERY long time to return. This can cause new connections to fail, and existing connections to time out.

−nokeepalive Option

Occasionally, on some networks, (again particularly in mixed environments), the DP4 server may report a client has died because it failed to send a keepalive message. In older versions of DP4 this could happen when the server was busy, as it incorrectly checked the keep alive timeout without making sure there were no more messages pending. However because TCP/IP packets are not necessarily sent when DP4 requests that they should be, (because of buffering inside the TCP/IP itself), it can happen that defective TCP/IP implementations fail to send keepalives from DP4 when they should. In this case using the −nokeepalive option on the server will cure the spurious timeout. You should not use the −nokeepalive option unless necessary as it prevents the detection of failed client machines. If you specify it on the server, you should also specify it on all the clients, as otherwise now unnecessary, harmless, but bandwidth consuming messages are still sent by idle requesters every few seconds.

If you use this option, then if a client is rebooted or turned off, or fails unexpectedly in some other way, the connection at the server end will not be terminated and cleaned up automatically. However, if the client is restarted the old connection will be cleaned up, assuming it has the same IP address as before (which will normally be the case unless DHCP is in use and the machine's lease on the IP address has expired).

−socket n Option

It is possible, though unlikely in practice, for DP4 TCP/IP networking to fail because of a conflict with other software, which may happen to use the same port number as DP4. You can force DP4 to use a different port number (the default is 5000) by specifying a value with the −socket option on the command line, for example −socket 6000 . You should avoid port numbers that are commonly used by programs such as FTP,Telnet etc. Numbers above 5000 are probably best. If used, this command tail must be specified and given the same value everywhere.

The use of port 5000 can cause problems on Windows XP machines. Please refer to article A4000024.htm on the DP4 Knowledge Base for details.

Mixed networks

If you are trying to connect from a Windows TCP/IP client to another operating system, particularly a legacy operating system such as FlexOS or IBM 4690 OS you may have problems because of incompatibilities in the underlying network protocol. This is a distinct possibility if you are using Windows 98/ME/2000/XP or later, but is much less likely if you are using Windows 95/NT4.

For Windows 2000/XP try adding the following DWORD values to the registry at
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters :
SackOpts=0
tcp1323Opts=0

The machine must be rebooted for these options to take effect. Disabling/Re-enabling the network connection has no effect.