Hello Bart Around 300s before the paths were declared hard failed and the devices offlined. This is when I/O restarts. The remaining paths on the second Qlogic port (that are not jammed) will not be used until the error handler activity completes. Until we get these for example, and device-mapper starts declaring paths down we are blocked. Apr 29 17:20:51 localhost kernel: sd 1:0:1:0: Device offlined - not ready after error recovery Apr 29 17:20:51 localhost kernel: sd 1:0:1:13: Device offlined - not ready after error recovery Laurence Oberman Principal Software Maintenance Engineer Red Hat Global Support Services ----- Original Message ----- From: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx> To: "Laurence Oberman" <loberman@xxxxxxxxxx> Cc: "James Bottomley" <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>, "linux-scsi" <linux-scsi@xxxxxxxxxxxxxxx>, "Mike Snitzer" <snitzer@xxxxxxxxxx>, linux-block@xxxxxxxxxxxxxxx, "device-mapper development" <dm-devel@xxxxxxxxxx>, lsf@xxxxxxxxxxxxxxxxxxxxxxxxxx Sent: Friday, April 29, 2016 8:36:22 PM Subject: Re: [Lsf] Notes from the four separate IO track sessions at LSF/MM On 04/29/2016 02:47 PM, Laurence Oberman wrote: > Recovery with 21 LUNS is 300s that have in-flights to abort. > [ ... ] > eh_deadline is set to 10 on the 2 qlogic ports, eh_timeout is set > to 10 for all devices. In multipath fast_io_fail_tmo=5 > > I jam one of the target array ports and discard the commands > effectively black-holing the commands and leave it that way until > we recover and I watch the I/O. The recovery takes around 300s even > with all the tuning and this effectively lands up in Oracle cluster > evictions. Hello Laurence, This discussion started as a discussion about the time needed to fail over from one path to another. How long did it take in your test before I/O failed over from the jammed port to another port? Thanks, Bart. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel