Kasimir Müller wrote: > Hi Tejun, > > Old communication appended below. > > I wish you a Happy Xmas and a successful New Year. > > I spent some time during Christmas to further investigate the problem. I > bought a new 500GB disk and put all data on this disk. > This is also contineously watched by nagios and cacti > Then > 1.) All 5 disks in the external case connected via Portmapper and sil24 > card have excellent health-status with smartd. > 2.) I get no(!!!!) errors at all if I use the disks as single drives or > with lvm. I verified this by copying large amounts of data (100-200GB) > with rsync , cp-av and running bonnie++ single and simultaneously > to various combinations of drives. > 3.) I get the errors as soon as I use raid. Same errors with raid0 (2 > disks), 1 (2 disks), 5 (3 disks) in any combination of the drives > 4.) The errors appear usually first during mkfs (same with ext3 and > reiserfs) and than > after writing about 10-50 GB to the raid, and repeat then at 5 to > 10 minute intervals according the disk activity. > 5.) I used Kernel 2.6.23.1 with Your latest patch: same result > 6.) I used kernel 2.6.24 patch rc-6 : same result > 7.) during the tests I marked all files with md5-sums: No data > corruption (!!!), so maybe I can live with it. Please apply the attached patch on top of 2.6.24-rc6 and report whether anything changes. Thanks. -- tejun
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index f0124a8..74269ed 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -1733,11 +1733,15 @@ static void ata_eh_link_autopsy(struct ata_link *link) ehc->i.action &= ~ATA_EH_PERDEV_MASK; } - /* consider speeding down */ + /* propagate timeout to host link */ + if ((all_err_mask & AC_ERR_TIMEOUT) && !ata_is_host_link(link)) + ap->link.eh_context.i.err_mask |= AC_ERR_TIMEOUT; + + /* record error and consider speeding down */ dev = ehc->i.dev; - if (!dev && ata_link_max_devices(link) == 1 && - ata_dev_enabled(link->device)) - dev = link->device; + if (!dev && ((ata_link_max_devices(link) == 1 && + ata_dev_enabled(link->device)))) + dev = link->device; if (dev) ehc->i.action |= ata_eh_speed_down(dev, is_io, all_err_mask); @@ -1759,8 +1763,14 @@ void ata_eh_autopsy(struct ata_port *ap) { struct ata_link *link; - __ata_port_for_each_link(link, ap) + ata_port_for_each_link(link, ap) ata_eh_link_autopsy(link); + + /* Autopsy of fanout ports can affect host link autopsy. + * Perform host link autopsy last. + */ + if (ap->nr_pmp_links) + ata_eh_link_autopsy(&ap->link); } /**