On Fri, 2013-09-27 at 10:06 +0200, Hannes Reinecke wrote: > On 09/27/2013 08:07 AM, Hannes Reinecke wrote: > > On 09/27/2013 01:49 AM, Mike Snitzer wrote: > >> On Thu, Sep 26 2013 at 7:22pm -0400, > >> Alasdair G Kergon <agk@xxxxxxxxxx> wrote: > >> > >>> On Thu, Sep 26, 2013 at 10:47:13AM -0700, Frank Mayhar wrote: > >>>> Launching it from ramdisk won't help, particularly, since it still goes > >>>> through the block layer. The other stuff won't help if a (potentially > >>>> unrelated) bug in the daemon happens to be being tickled at the same > >>>> time, or if some dependency happens to be broken and _that's_ what's > >>>> preventing the daemon from making progress. > >>> > >>> Then put more effort into debugging your daemon so it doesn't have > >>> bugs that make it die? Implement the timeout in a robust independent > >>> daemon if it's other code there that's unreliable? > >>> > >>>> And as far as lvm2 and multipath-tools, yeah, they cope okay in the kind > >>>> of environments most people have, but that's not the kind of environment > >>>> (or scale) we have to deal with. > >>> > >>> In what way are your requirements so different that a locked-into-memory > >>> monitoring daemon cannot implement this timeout? > >> > >> Frank, I had a look at your patch. It leaves a lot to be desired, I was > >> starting to clean it up but ultimately found myself agreeing with > >> Alasdair's original point: that this policy should be implemented in the > >> userspace daemon. > >> > > _Actually_ there is a way how this could be implemented properly: > > implement a blk_timeout function. > > > > Thing is, every request_queue might have a timeout function > > implemented, whose goal is to abort requests which are beyond that > > timeout. EG SCSI uses that for the dev_loss_tmo mechanism. > > > > Multipath what with it being request-based could easily implement > > the same mechanism, namely have to blk_timeout function which would > > just re-arm the timeout in the default case, but abort any queued > > I/O (after a timeout) if all paths are down. > > > > Hmm. I see to draft up a PoC. > > > And indeed, here it is. > > Completely untested, just to give you an idea what I was going on > about. Let's see if I can put this to test somewhere... Thanks, Hannes! I'll grab this and test it today. I clearly don't know enough about the block layer, since using blk_timeout never even crossed my mind. -- Frank Mayhar 310-460-4042 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel