On Fri, Jan 12 2018 at 8:37pm -0500, Mike Snitzer <snitzer@xxxxxxxxxx> wrote: > On Fri, Jan 12 2018 at 8:00pm -0500, > Bart Van Assche <Bart.VanAssche@xxxxxxx> wrote: > > > On Fri, 2018-01-12 at 19:52 -0500, Mike Snitzer wrote: > > > It was 50 ms before it was 100 ms. No real explaination for these > > > values other than they seem to make Bart's IB SRP testbed happy? > > > > But that constant was not introduced by me in the dm code. > > No actually it was (not that there's anything wrong with that): > > commit 06eb061f48594aa369f6e852b352410298b317a8 > Author: Bart Van Assche <bart.vanassche@xxxxxxxxxxx> > Date: Fri Apr 7 16:50:44 2017 -0700 > > dm mpath: requeue after a small delay if blk_get_request() fails > > If blk_get_request() returns ENODEV then multipath_clone_and_map() > causes a request to be requeued immediately. This can cause a kworker > thread to spend 100% of the CPU time of a single core in > __blk_mq_run_hw_queue() and also can cause device removal to never > finish. > > Avoid this by only requeuing after a delay if blk_get_request() fails. > Additionally, reduce the requeue delay. > > Cc: stable@xxxxxxxxxxxxxxx # 4.9+ > Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx> > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx> > > Note that this commit actually details a different case where a > blk_get_request() (in existing code) return of -ENODEV is a very > compelling case to use DM_MAPIO_DELAY_REQUEUE. > > SO I'll revisit what is appropriate in multipath_clone_and_map() on > Monday. Sleep helped. I had another look and it is only the old .request_fn blk_get_request() code that even sets -ENODEV (if blk_queue_dying). But thankfully the blk_get_request() error handling in multipath_clone_and_map() checks for blk_queue_dying() and will return DM_MAPIO_DELAY_REQUEUE. So we're all set for this case. Mike