Debugging

VI. General Guide to Debugging Standalone Unix TRANSP.

1. Debugging existing programs.

If a program (transp or support utility) fails unexpectedly, then the usual practice is to apply debugging tools to learn about the problem.

On unix systems the symbolic debugger program is known as dbx; when transp is set up an alias debug is provided for running dbx with the appropriate option so that dbx can find the transp source code.

Note: (tbt 18 August 1996) At General Atomics, on their HP computers, dbx is not available - xdb is used. Thus the .login file needs

alias debug xdb instead of alias debug dbx

The user can gain familiarity with dbx from the standard unix documentation.

-----------------------------------------------

Note (dmc 1 Dec 1995): better debuggers than dbx are available. They generally cost money but may well be worth the cost if you expect to do a lot of debugging. On our DEC AXP unix systems we use BBN Totalview.

-----------------------------------------------

Debugging is a multi-stage process, involving roughly the following steps:

(a) identifying the part of the program that is failing.

(b) building a debug version of the program with the relevant parts debug compiled.

(c) running the program under interactive debugger control, and studying the precise nature of the program failure.

Of course this can be an iterative process. One may have to run through steps (a) - (b) - (c) several times before locating the root of the trouble.

It is usually best to pay substantial attention to (a) on the first iteration. Careful study of the program output often yields precise information as to what is failing. One should have familiarity with the program being debugged, at least to the extent of knowing what the input and output files are, and where error messages are likely to have been written.

Some systems (e.g. Silicon Graphics) can be configured to write a traceback to standard output in case of floating point exceptions (overflow, divide by zero, etc). Such output might be found e.g. in a TRANSP run log file.

Sometimes, debug information can be gleened from the standard (non-debug) executable version of the code. This executable file, for programs other than transp, is <program-name> in the directory $LOCAL/exe. For a failed TRANSP run it is a file xxxxxAyyTR.EXE in the appropriate tokamak work directory. For example, the executable for transp run TFTR 37065Z01 is

$WORKDIR/TFTR/37065Z01TR.EXE

If a run fails with an arithmetic error, then a core dump file is usually created. This is a large file "core" written into the working directory of the process running the program at time of failure. If xxxx is the name of the non-debug executable, then the sequence

% cd ... (to the directory containing the core dump)

% debug xxxx core (run the dbx debugger on xxxx; read core dump file)

% dbx> where (generate a traceback to the point of failure)

Since xxxx was not built with debug compiled modules, it is not possible to examine the values of any memory locations. Nevertheless, the traceback information gives a very big clue for step (a) in building a debug executable. (This does not work on all machines).

Generally, the problem for step (a) is coming up with a list of sources to have debug compiled. For almost all transp software, the source for a given subroutine (e.g. subroutine "stepon") will be in a source file with the same name (e.g. stepon.for) in a subdirectory of $CODESYSDIR/source. The command

% listelem stepon

can usually be used to find out to which library the subroutine belongs; in this case, trcore, and the full source name is

$CODESYSDIR/source/trcore/stepon.for

The same method works for finding the source of driver modules.

Occasionally, a subroutine is hidden in a source file having some other name. The entire code can be searched for the name by using the command

% allsearch xxxx abcd

which will search all sources for the indicated string abcd. If debugging a transp run, give the name transp for xxxx.

In preparing the list (a), it is necessary only to give the root name for each desired source. The list (a) is maintained in a file known as the "debug environment file". The following commands are available for manipulating the debug environment:

% dbxadd <name1> <name2> ... -- add one or more names to the environment

% dbxclear -- delete all names from the environment

% dbxsave <save-name> -- save current environment under <save-name> 
			the current environment is cleared.

% dbxrestore <save-name> -- restore previously saved environment

When creating debug executables, copies of libraries, make files, sources, etc., are stored in the debug work area $DBGDIR. All this information can be removed when no longer needed, with:

% dbxcleanup -- delete the current environment, saved environments, and all debug work files.

The current debug environment can be displayed with the command

% dbxlist

Step (b) entails using the debug environment to actually load a debug executable. For programs other than transp, (for program xxxx) issue the command

% uplink xxxx debug

to make a local debug copy of xxxx, which will be named xxxx.dbx. The uplink command generates make files on the fly, and performs any debug compiles and makes any temporary debug subroutine libraries needed to create the debug executable containing the debug compiled subroutines indicated in the debug environment.

If you're in a hurry, you can issue the command

% uplink xxxx debug make

and skip all the "uplib" and "makelink" looping that goes on. For details, cf. the "uplink" info in the transp_devel.doc document.

Step (b) for a TRANSP run (say TFTR 37065Z01) is carried out in the work directory for the run. Use the commands

% cd $WORKDIR/TFTR

% uplink 37065Z01db debug

to make the transp debug executable, 37065Z01db.dbx.

Step (c) simply involves running the debug executable under dbx control. Use

% debug xxxx.dbx

for programs other than transp, or

% debug 37065Z01db.dbx

e.g. for transp run 37065Z01.

The debug command invokes the named program under dbx control. Use the standard dbx facilities to set break points, start and stop execution, examine local variables, etc. Or: use Totalview or other proprietary debugger, if available.

There may be some workstation dependency in the functioning of dbx. Debuggers often define their own environment, and this may require customization for smooth functioning. The details of this depend on the type of system you are using.

Once a problem has been fixed, and the standard source files have been updated, the command

% uplink xxxx

will create an updated standard (non-debug) version of xxxx in the $CODESYSDIR/exe area.

For a transp run with restart file xxxxxAyyRS.DAT, the command

% runtr xxxxxAyy lrs

will relink and restart the transp run, using the modified sources. (i.e. to restart a crashed run). If there is no restart file, then

% runtr xxxxxAyy link

will relink the run and start it from the beginning.

If you have a cvs installed copy of transp and you find and fix bugs, please consider updating the PPPL central repository with your improvements. Please do follow protocol and be sure that your changes really are improvements that do not damage others' ability to use the code.

See codesys/source/doc/transp_share.doc for more information on updating the PPPL central code repository.

2. Code development.

The scheme described above is adequate when the problem is to debug an existing code and apply simple patches to existing subroutines. However, additional steps must be taken if code changes are structural in nature.

Structural changes are changes which affect the interelationship of sources to executables:

change in the contents of subroutine libraries (i.e. adding or deleting subroutines) change in the list of libraries linked to create a program (i.e. addition or deletion of libraries or modification of the load ordering of libraries) change in the use of INCLUDE files (adding or removing INCLUDE file references within individual subroutines or driver modules).

To make changes of these kinds, please refer to the instructions in codesys/source/doc/transp_devel.doc

3. Automatic Procedures.

Automated code maintenance procedures will send mail if errors occur; these mail messages are sent to the addresses listed in the file

config/csh_mail.address

if it exists; if not, mail is sent to the output of the `whoami' command.

After TRANSP is successfully installed, it is recommended that cvs transp sites run the script

$TRANSPROOT/daemon/update.daemon

frequently, to receive updates from the PPPL repository, and to run an incremental make on the code. If any problems occur these will be reported via mail to the above indicated addresses. The full log file for the update job is left in

$TRANSPROOT/log/xxx_update.log

where xxx is Mon, Tue, Wed, ... depending on which day the update job ran.

Use of update.daemon is also prerequisite to committing of locally developed TRANSP changes -- see codesys/source/doc/unix_commit.doc, which also includes a more detailed description of what update.daemon does.