>mpirun -np 4 -host jac-123,jac-122,jac-121,jac-120 mpiPi /bin/tcsh running /home/jconboy/.tcshrc level 1 /bin/tcsh running /home/jconboy/.tcshrc level 1 /bin/tcsh running /home/jconboy/.tcshrc level 1 1 > MPI_BCAST ========================================================= Enter the number of intervals: or Quit : Master / Slaves 0 > FINALIZE / FINALIZE -1 > FINALIZE / BCAST -2 > ABORT / BCAST 3 > MPI_BCAST 2 > MPI_BCAST 1000000 Enter 0 < nerr < 1000000 to provoke segv or 0 > nerr > -1000000 to call mpi_abort ( slave ) mpi_finalize ( master ) 0 3 > MPI_BCAST 1 > MPI_BCAST 2 > MPI_BCAST ========================================================= pi is approximately: 3.1415926535899030 Error is : 0.0000000000001101 ========================================================= Enter the number of intervals: or Quit : Master / Slaves 0 > FINALIZE / FINALIZE -1 > FINALIZE / BCAST 0 > MPI_BCAST -2 > ABORT / BCAST 0 1 > MPI_FINALIZE 2 > MPI_FINALIZE ========================================================= 3 > MPI_FINALIZE 0 > MPI_BCAST 0 > MPI_FINALIZE 0 MPI_FINALIZE > 0 1 MPI_FINALIZE > 0 2 MPI_FINALIZE > 0 3 MPI_FINALIZE > 0 > |
1000000 step calculation, followed by 'normal' termination ( all processes in MPI_FINALIZE )
pi is approximately: 3.1415926535899030 Error is : 0.0000000000001101 ========================================================= Enter the number of intervals: or Quit : Master / Slaves 0 > FINALIZE / FINALIZE -1 > FINALIZE / BCAST -2 > ABORT / BCAST 0 > MPI_BCAST -1 1 > MPI_BCAST 2 > MPI_BCAST 3 > MPI_BCAST ========================================================= 0 > MPI_BCAST 0 > MPI_FINALIZE |
jac-120 jconboy 12389 1 0 11:00 ? Ss 0:00 orted --bootproxy 1 --name 0.0.4 --num_procs 5 --vpid_start 0 jconboy 12390 12389 89 11:00 ? R 3:03 \_ mpiPi jac-121 jconboy 1152 1 0 11:00 ? Ss 0:00 orted --bootproxy 1 --name 0.0.3 --num_procs 5 --vpid_start 0 jconboy 1153 1152 95 11:00 ? R 3:15 \_ mpiPi jac-122 jconboy 12117 1 0 11:00 ? Ss 0:00 orted --bootproxy 1 --name 0.0.2 --num_procs 5 --vpid_start 0 jconboy 12118 12117 99 11:00 ? R 3:26 \_ mpiPi jac-123 jconboy 13825 2478 0 11:00 pts/5 S+ 0:00 | \_ mpirun -np 4 -host jac-123,jac-122,jac-121,jac-120 mpiPi jconboy 13828 1 0 11:00 ? Ss 0:00 orted --bootproxy 1 --name 0.0.1 --num_procs 5 --vpid_start 0 jconboy 13837 13828 0 11:00 ? S 0:00 \_ mpiPi |
Abnormal termination - main calls FINALISE while slaves in BCAST wait
pi is approximately: 3.1415926535899030 Error is : 0.0000000000001101 ========================================================= Enter the number of intervals: or Quit : Master / Slaves 0 > FINALIZE / FINALIZE -1 > FINALIZE / BCAST -2 > ABORT / BCAST 0 > MPI_BCAST -2 2 > MPI_BCAST 1 > MPI_BCAST 3 > MPI_BCAST ========================================================= 0 > MPI_BCAST 0 > MPI_ABORT [jac-123:14677] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode -2 mpirun noticed that job rank 1 with PID 13442 on node jac-122 exited on signal 15 (Terminated). 2 additional processes aborted (not shown) |
========================================================= 1 > MPI_BCAST Enter the number of intervals: or Quit : Master / Slaves 0 > FINALIZE / FINALIZE -1 > FINALIZE / BCAST -2 > ABORT / BCAST 2 > MPI_BCAST 3 > MPI_BCAST 1000000 Enter 0 < nerr < 1000000 to provoke segv or 0 > nerr > -1000000 to call mpi_abort ( slave ) mpi_finalize ( master ) 10003 Proc 1 error value 10003 Proc 3 error value 10003 Proc 2 error value 10003 ========================================================= Proc 0 error value 10003 2 > Segv [jac-121:03432] *** Process received signal *** [jac-121:03432] Signal: Segmentation fault (11) [jac-121:03432] Signal code: Address not mapped (1) [jac-121:03432] Failing at address: 0x7c7aeb0 [jac-121:03432] [ 0] [0x57f440] [jac-121:03432] [ 1] mpiPi(MAIN__+0x177) [0x80493a7] [jac-121:03432] [ 2] mpiPi(main+0x39) [0x804921d] [jac-121:03432] [ 3] /lib/libc.so.6(__libc_start_main+0xdc) [0x860f2c] [jac-121:03432] [ 4] mpiPi [0x80491e1] [jac-121:03432] *** End of error message *** 0 > MPI_BCAST 3 > MPI_BCAST 1 > MPI_BCAST mpirun noticed that job rank 0 with PID 14977 on node jac-123 exited on signal 15 (Terminated). 3 additional processes aborted (not shown) > |
Segv for process 2
========================================================= Enter the number of intervals: or Quit : Master / Slaves 0 > FINALIZE / FINALIZE -1 > FINALIZE / BCAST -2 > ABORT / BCAST 3 > MPI_BCAST 2 > MPI_BCAST 1 > MPI_BCAST 1000000 Enter 0 < nerr < 1000000 to provoke segv or 0 > nerr > -1000000 to call mpi_abort ( slave ) mpi_finalize ( master ) -10004 Proc 1 error value -10004 Proc 2 error value -10004 Proc 3 error value -10004 ========================================================= Proc 0 error value -10004 1 > MPI_BCAST 2 > MPI_BCAST 0 > MPI_BCAST 3 > MPI_ABORT [jac-120:03448] MPI_ABORT invoked on rank 3 in communicator MPI_COMM_WORLD with errorcode 1 mpirun noticed that job rank 0 with PID 15222 on node jac-123 exited on signal 15 (Terminated). 2 additional processes aborted (not shown) > |
Termination by MPI_ABORT call during loop, process 3
Enter the number of intervals: or Quit : Master / Slaves 0 > FINALIZE / FINALIZE -1 > FINALIZE / BCAST -2 > ABORT / BCAST 3 > MPI_BCAST 2 > MPI_BCAST 1000000 Enter 0 < nerr < 1000000 to provoke segv or 0 > nerr > -1000000 to call mpi_abort ( slave ) mpi_finalize ( master ) -10001 ========================================================= Proc 0 error value -10001 Proc 2 error value -10001 0 > MPI_BCAST 0 > MPI_FINALIZE Proc 1 error value -10001 Proc 3 error value -10001 1 > MPI_BCAST 3 > MPI_BCAST 2 > MPI_BCAST ^Z Suspended >kill %2 >mpirun: killing job... mpirun noticed that job rank 0 with PID 15288 on node jac-123 exited on signal 15 (Terminated). 3 additional processes aborted (not shown) |
jac-120 jconboy 5004 1 0 11:32 ? Ss 0:00 orted --bootproxy 1 --name 0.0.4 --num_procs 5 --vpid_start 0 jconboy 5005 5004 88 11:32 ? R 0:23 \_ mpiPi jac-121 jconboy 4906 1 0 11:32 ? Ss 0:00 orted --bootproxy 1 --name 0.0.3 --num_procs 5 --vpid_start 0 jconboy 4907 4906 91 11:32 ? R 0:25 \_ mpiPi jac-122 jconboy 15395 1 0 11:32 ? Ss 0:00 orted --bootproxy 1 --name 0.0.2 --num_procs 5 --vpid_start 0 jconboy 15396 15395 97 11:32 ? R 0:28 \_ mpiPi jac-123 jconboy 15276 2478 0 11:32 pts/5 S+ 0:00 | \_ mpirun -np 4 -host jac-123,jac-122,jac-121,jac-120 mpiPi -- jconboy 15279 1 0 11:32 ? Ss 0:00 orted --bootproxy 1 --name 0.0.1 --num_procs 5 --vpid_start 0 jconboy 15288 15279 0 11:32 ? S 0:00 \_ mpiPi |
Job hangs after MPI_FINALIZE call during loop, process 0