OK, some more investigation seems to be pointing to the heartbeat thread being not woken up when it's timer tells it to. This might be simply that there are other higher-priority tasks happening on the system because of the IO load. Now I've upgraded my cluster to 2.6.10, iSCSI seems to be more stable (same iSCSI software interestingly) so I've set the heartbeat nice level to -20 (same as the iSCSI process) and I'll see if it survives the weekend. It's done overnight so far which is better than I've had yet 8) -- patrick