Hi Andrew, there is a regression in the qla2xxx driver, introduced by this commit: commit 083a469db4ecf3b286a96b5b722c37fc1affe0be Author: Giridhar Malavali <giridhar.malavali@xxxxxxxxxx> Date: Fri May 28 15:08:18 2010 -0700 [SCSI] qla2xxx: Correct use-after-free oops seen during EH-abort. Hold a reference to the srb (sp) while aborting an I/O -- as the I/O can/will complete from within the interrupt-context. Signed-off-by: Andrew Vasquez <andrew.vasquez@xxxxxxxxxx> Signed-off-by: Giridhar Malavali <giridhar.malavali@xxxxxxxxxx> Signed-off-by: James Bottomley <James.Bottomley@xxxxxxx> With this patch a reference counting is introduced for srb's. However, there is this code in qla2xxx_eh_abort(): spin_unlock_irqrestore(&ha->hardware_lock, flags); /* Wait for the command to be returned. */ if (wait) { if (qla2x00_eh_wait_on_command(cmd) != QLA_SUCCESS) { qla_printk(KERN_ERR, ha, "scsi(%ld:%d:%d): Abort handler timed out -- %lx " "%x.\n", vha->host_no, id, lun, serial, ret); ret = FAILED; } } if (got_ref) qla2x00_sp_compl(ha, sp); where qla2x00_eh_wait_on_command() is waiting for a command to be completed by the midlayer. Which will never happen, as the refcount is held during that time and only released on the last lines. Hence any command abort will be timed out and the error will be escalated further. I have fixed it by simply moving the last two lines above the 'if (wait)' condition. however I fail to see the race condition mentioned, and hence the validity of the reference counting in the first place. So it might be that I'm missing something subtle here, so I would ask you to have a look here. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html