On 6/14/07, Paul Clements <paul.clements@xxxxxxxxxxxx> wrote:
Mike Snitzer wrote: > Here are the steps to reproduce reliably on SLES10 SP1: > 1) establish a raid1 mirror (md0) using one local member (sdc1) and > one remote member (nbd0) > 2) power off the remote machine, whereby severing nbd0's connection > 3) perform IO to the filesystem that is on the md0 device to enduce > the MD layer to mark the nbd device as "faulty" > 4) cat /proc/mdstat hangs, sysrq trace was collected That's working as designed. NBD works over TCP. You're going to have to wait for TCP to time out before an error occurs. Until then I/O will hang.
With kernel.org 2.6.15.7 (uni-processor) I've not seen NBD hang in the kernel like I am with RHEL5 and SLES10. This hang (tcp timeout) is indefinite oh RHEL5 and ~5min on SLES10. Should/can I be playing with TCP timeout values? Why was this not a concern with kernel.org 2.6.15.7; I was able to "feel" the nbd connection break immediately; no MD superblock update hangs, no longwinded (or indefinite) TCP timeout. regards, Mike - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html