On Wed, Aug 10, 2005 at 05:49:40PM -0400, Alan Kasindorf wrote: > Hey, > > > At some random point in time today, one of the machines lost one of its > four 3par mounts. All other mounts worked fine. This has happened once > or twice before as well, but we rebooted before I had time to inspect > the issue. > > Is this known at all? Is there anything else I can provide so that we > can figure out why this happened? I had been running multipath tools for > two months on a test box and never encounterred this problem. It's only > snuck up as we've started deploying it on more machines for > I've had problems like this happen to me on 3par too. What kernel version are you using? It almost always happened when the SAN got a RSCN (using when another server was rebooted) I found that, at least in kernel 2.6.11.7, that if I changed the line bio->bi_rw != (1 << BIO_RW_FAILFAST); to bio->bi_rw != (0 << BIO_RW_FAILFAST); in drivers/md/dm_mpath.c the problem went away. Now, in the newest kernels, after there was a big change to the qla drivers (2.6.12-rc? and beyond, I believe) I did not need to do the above change, but I now get aborts sometimes (these aborts apparently come from the qlogic card). The aborts recover, but I have been unable to determine why I am getting them. Andy