On Tue, Oct 31, 2006 at 08:51:08AM -0500, James Smart wrote: > Linas, > > I don't know of anything in this area. > I also need a deeper understand of what the error was, and how, > that was injected. This play into it. When the PCI slot is frozen, the PCI bridge will block all writes to the device, and will return all 0xffffffff for reads. All DMA will be prevented from going through. > Also, PCI error recovery is not a simple task. I've implemented it for the ipr and symbios SCSI controllers, and for the e100, e1000, ixgb and s2io ethernet cards. If you revew the actual code, you will see its fairly tiny. Mostly I've discovered that if the device driver has clean, clear-cut device-up/device-down routines, then recovery is straightforward. FWIW, I've run some of the kernels & devices through 48-hour runs with thousands of errors injected and successfully recovered from. > There are many > aspects to the adapter messaging interface and the affects of the > PCI error recovery scheme that has to be closely looked at. DMA > errors can be very fatal, even if the PCI bus survives. In many > cases, the only safe recovery is a hard adapter reset (with little > to no interaction with the adapter to clean up). Currently, all of the device drivers I mention above perform the recovery with a hard reset. The generic API does not require this, but this seems to be the simplest, most robust/reliable route. I experimeted with non-hard-reset on the s2io, which I got "almost working". I don't know that its worth the trouble. Just to be clear, I'm refering to the infrastructure documented in Documentation/pci-error-recovery.txt --linas - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html