Dragging this back up into the light... On Thu, 2013-09-26 at 19:49 -0400, Mike Snitzer wrote: > Frank, I had a look at your patch. It leaves a lot to be desired, I was > starting to clean it up but ultimately found myself agreeing with > Alasdair's original point: that this policy should be implemented in the > userspace daemon. I've found and fixed a couple of bugs but I would still like to know what issues you had with the patch. As I said before, I would be more than happy to clean it up. In the time since we had this discussion, by the way, we ran into a problem that a userspace daemon can't solve: That of shutdown. We ran into a number of failures in which systems were hung for hours. It turned out that they were caused by a regular system shutdown. Our backing store is network-based and networking was getting killed before applications (as is usually the case), leaving I/O outstanding on the device. Since queue_if_no_path was set, the I/O wasn't dumped and our daemon was killed by shutdown very shortly thereafter so it couldn't recover (otherwise it would have cleaned things up). With those I/Os sitting queued in multipath, with no network and no daemon to turn off queue_if_no_path, the systems just sat. When we finally diagnosed this, we realized that the timeout would work perfectly to solve the problem, automatically turning queue_if_no_path off shortly after the network went away without depending on the intervention of the no-longer-running daemon. So how do you guys deal with this failure scenario? -- Frank Mayhar 310-460-4042 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel