On Fri, 16 Sep 2005, Mike Anderson wrote: > > This makes me suspect that the condition about host_busy == host_failed is > > wrong. Unfortunately I don't know why it's wrong or how to fix it. > > > > Perhaps somebody on the SCSI list can provide the answer. > > > > What condition are you thinking would happen if this was wrong (we are > getting woken up too early?)? Yes, that is what would happen. Or failing to go back to sleep when we should, which might be even worse. > I did a quick look and could not see changes > between 2.6.13 and 2.16.14-rc1 that would make these values wrong. This is > just a check to ensure the eh is not woken up to early. Historically in > older scsi eh code there used to be a panic if the error handler was woken > up to early. In scsi_unjam_host and a quick look at ata_scsi_error getting > woken up early should not cause a panic. > > I built a listfile (libata-scsi.lst) and it is probably not an exact > match. ..but.. > > These lines in ata_scsi_error(..) appear to be close to the failure and > edx being zero as shown above in the oops would not be good. > ap->ops->eng_timeout(ap); > 499: 8b 50 04 mov 0x4(%eax),%edx > 49c: ff 52 48 call *0x48(%edx) > > Since I do not know the libata code it is unclear from doing a short > search how an ops pointer could get altered or if my observations are > correct. Maybe the wakeup occurred before ap->ops was set correctly, or after it was unset. Jan, at what point did the oops happen? Was it right after the device was detected, during removal, or some other time? Can you put in some debugging printk's to see what values are in ap, ap->ops, and ap->ops->eng_timeout? Alan Stern - : send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html