On Wed, 2008-02-20 at 17:54 +0800, Keith Hopkins wrote: > On 02/20/2008 11:48 AM, James Bottomley wrote: > > On Tue, 2008-02-19 at 10:22 -0600, James Bottomley wrote: > >> I'll see if I can come up with patches to fix this ... or at least > >> mitigate the problems it causes. > > > > Darrick's working on the ascb sequencer use after free problem. > > > > I looked into some of the error handling in libsas, and apparently > > that's a bit of a huge screw up too. There are a number of places where > > we won't complete a task that is being errored out and thus causes > > timeout errors. This patch is actually for libsas to fix all of this. > > > > I've managed to reproduce some of your problem by firing random resets > > across a disk under load, and this recovers the protocol errors for me. > > However, I can't reproduce the TMF timeout which caused the sequencer > > screw up, so you still need to wait for Darrick's fix as well. > > > > James > > > > Hi James, Darrick, > > Thanks again for looking more into this. I'll wait for Darrick's > patch and try it together with this libsas patch. Should I leave > James' first patch in also? Yes, that's a requirement just to get the REQ_TASK_ABORT for the protocol errors actually to work ... I'm afraid this is like peeling an onion as I said .. and you're going to build up layers of patches. However, the ones that are obvious bug fixes and I can test (all of them so far), I'm putting in the rc fixes tree of SCSI, so you can download a rollup here: http://www.kernel.org/pub/linux/kernel/people/jejb/scsi-rc-fixes-2.6.diff James - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html