----- Original Message ----- > From: "Laurence Oberman" <loberman@xxxxxxxxxx> > To: "Mike Snitzer" <snitzer@xxxxxxxxxx> > Cc: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>, dm-devel@xxxxxxxxxx, linux-scsi@xxxxxxxxxxxxxxx > Sent: Friday, August 5, 2016 7:43:30 AM > Subject: Re: dm-mq and end_clone_request() > > > > ----- Original Message ----- > > From: "Laurence Oberman" <loberman@xxxxxxxxxx> > > To: "Mike Snitzer" <snitzer@xxxxxxxxxx> > > Cc: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>, dm-devel@xxxxxxxxxx, > > linux-scsi@xxxxxxxxxxxxxxx > > Sent: Thursday, August 4, 2016 9:07:28 PM > > Subject: Re: dm-mq and end_clone_request() > > > > > > > > ----- Original Message ----- > > > From: "Mike Snitzer" <snitzer@xxxxxxxxxx> > > > To: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx> > > > Cc: dm-devel@xxxxxxxxxx, "Laurence Oberman" <loberman@xxxxxxxxxx>, > > > linux-scsi@xxxxxxxxxxxxxxx > > > Sent: Thursday, August 4, 2016 7:58:50 PM > > > Subject: Re: dm-mq and end_clone_request() > > > > > > I've staged another fix, Laurence is seeing success with this added: > > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.8&id=d50a6450104c237db1dc75314d17b78c990a8c05 > > > > > > I'll be sending all the fixes I've queued to Linus tonight or early > > > tomorrow (since I'll then be on vacation until Monday 8/15). > > > > > Hello Bart, > > > > I applied that patch to your kernel and while I still obviously see all the > > debug logging its no longer failing fio for me. > > I ran 8 loops with 20 parallel fio runs. This was on a different server to > > the one I had been testing on. > > > > However I am concerned about timing playing a part here here so let us know > > what you find. > > > > Thanks > > Laurence > Replying to my own message: > > Hi Bart, Mike > > Further testing has shown we are still exposed here so more investigation is > necessary. > The above patch seems to help but I still see sporadic cases of errors > escaping up the stack. > > I expect you will see the same so more work to do here to figure this out. > > Thanks > Laurence > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Hello Bart I completely forgot I had set no_path_retry=12, so after 12 retries it will error out. This is likely why I had different results seemingly affected by timing. Mike reminded me of it this morning. What do you have set for no_path_retry, because when I set it to queue, it blocks the paths coming back for some reason. I am now investigating why that is happening :). I see now I need to add "simultaneous all paths lost" scenarios to my QA testing, as its not a common scenario. Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html