So my position changes a little now that you pointed out that you release the request queue when fastfail kicks in. Mike Christie wrote:
Do we want to fail IO that was sitting in the queue _and_ all new incoming IO or just what was sitting in the queue?
I believe all i/o - so that the upper layer sees everything occurring on each state change and can choose accordingly.
The patches I sent, unplug the queues when failfast timer expires so that is where the chk ready test for failfast comes in. When failfast fires, the queue will be unplugged and we will hit the failfast test and anything coming through will be failed. Alternatively: 1. What about making the transport check ready test standard and adding a transportt->check_ready callout which gets called before scsi_dispatch_cmd calls the queuecommand?
I like the idea, as I hated having to add this snippet to each driver. The other place we had to use it was in slave_alloc(), so doing the same thing there would be great.
2. Another option could be do add some code which does it a layer higher at the scsi device level. The function would set the scsi_device state to some value that would indicate the device is not ready and wants to fail IO, then it would unplug the queue. The scsi_prep_fn would then check for that state and fail IO. Or we could just set the state to an existing value like offline and we would not have to modify and existing state checks.
> > 3. Or we could go one layer higher than that and add and set some block > layer bits. Unblock the queue and before the scsi_prep_fn is called the > block layer would check the state bit and fail IO. > Yep, and likely a better decision long term (though more work) as it's storage transport agnostic. The "other layer" guys can answer better on this one. I would think we still have to keep the chkready()s as there will always be race conditions.
4. Those would work if we want to fail IO that was queued and new incoming IO. If we want to just fail IO that was queued, and queue IO new incoming IO, then the block layer could offer a function which grabs the queue lock, dequeued what was there and then fail each IO. scsi-ml would then call that function as a helper. SCSI-ml again would not see the IO here.
True - but this is a new and different abort routine from the one today that expects to kick in on timeout. We purposely stopped the eh handler from aborting a command while you had connectivity lost. So, it would be a bad idea to go down this path. -- james s - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html