Re: dm-mq and end_clone_request()

Laurence Oberman <loberman@xxxxxxxxxx> · Fri, 5 Aug 2016 11:39:05 -0400 (EDT)

----- Original Message -----
> From: "Laurence Oberman" <loberman@xxxxxxxxxx>
> To: "Mike Snitzer" <snitzer@xxxxxxxxxx>
> Cc: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>, dm-devel@xxxxxxxxxx, linux-scsi@xxxxxxxxxxxxxxx
> Sent: Friday, August 5, 2016 7:43:30 AM
> Subject: Re: dm-mq and end_clone_request()
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman@xxxxxxxxxx>
> > To: "Mike Snitzer" <snitzer@xxxxxxxxxx>
> > Cc: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>, dm-devel@xxxxxxxxxx,
> > linux-scsi@xxxxxxxxxxxxxxx
> > Sent: Thursday, August 4, 2016 9:07:28 PM
> > Subject: Re: dm-mq and end_clone_request()
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Mike Snitzer" <snitzer@xxxxxxxxxx>
> > > To: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>
> > > Cc: dm-devel@xxxxxxxxxx, "Laurence Oberman" <loberman@xxxxxxxxxx>,
> > > linux-scsi@xxxxxxxxxxxxxxx
> > > Sent: Thursday, August 4, 2016 7:58:50 PM
> > > Subject: Re: dm-mq and end_clone_request()
> > > 
> > > I've staged another fix, Laurence is seeing success with this added:
> > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=d50a6450104c237db1dc75314d17b78c990a8c05
> > > 
> > > I'll be sending all the fixes I've queued to Linus tonight or early
> > > tomorrow (since I'll then be on vacation until Monday 8/15).
> > > 
> > Hello Bart,
> > 
> > I applied that patch to your kernel and while I still obviously see all the
> > debug logging its no longer failing fio for me.
> > I ran 8 loops with 20 parallel fio runs. This was on a different server to
> > the one I had been testing on.
> > 
> > However I am concerned about timing playing a part here here so let us know
> > what you find.
> > 
> > Thanks
> > Laurence
> Replying to my own message:
> 
> Hi Bart, Mike
> 
> Further testing has shown we are still exposed here so more investigation is
> necessary.
> The above patch seems to help but I still see sporadic cases of errors
> escaping up the stack.
> 
> I expect you will see the same so more work to do here to figure this out.
> 
> Thanks
> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Hello Bart

I completely forgot I had set no_path_retry=12, so after 12 retries it will error out.
This is likely why I had different results seemingly affected by timing.
Mike reminded me of it this morning.

What do you have set for no_path_retry, because when I set it to queue, it blocks the paths coming back for some reason.
I am now investigating why that is happening :).
I see now I need to add "simultaneous all paths lost" scenarios to my QA testing, as its not a common scenario.

Thanks
Laurence
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html