On Thu, Apr 29, 2010 at 3:14 AM, Richard Mawson <richard@xxxxxxxxxxxxxxxxxxx> wrote: > Tim, > > On Mon, Apr 26, 2010 at 12:59:05PM +0100, Tim Small wrote: >> If you want to try to debug this further - you could turn on PCI parity >> error detection (either using EDAC module, or via userspace with >> lspci/setpci)? >> >> # modprobe edac_core >> # echo 1 > /sys/module/edac_core/parameters/check_pci_errors >> >> If you're after a different solution for that machine, you can buy Sii >> 3124 based cards (PCI-X to 4x SATA) for about the same price as that >> adaptor.... >> >> http://www.siliconimage.com/products/product.aspx?pid=27 > > Thanks for your suggestions. > > I'm not too familiar with debugging pci errors, but I'm willing to try things > out if there are suggestions as to what to look for. > > Having moves this to another system, still using the pci-pcie bridge, there > are problems too -- it just takes longer to show up. The system locks up when > copying large quantities of data to the disks. > > The symptom is the following code in the interrupt handler being called many > many times: > > if (status == 0xffffffff) { > printk(KERN_ERR DRV_NAME ": IRQ status == 0xffffffff, " > "PCI fault or device removal?\n"); > > Does this indicate a hardware error? Is there a safe way to reset the device > in this state to avoid the repeated calls to the interrupt handler that I > suspect is the cause of the machine being unresponsive? > > I'm looking into pci debugging techniques, but any pointers would be welcome. Register reads returning all 1s would indicate that there are likely PCI aborts happening - could be either the bridge or the chip itself has stopped responding. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html