Hello Dan, On Wednesday 23 April 2008, Dan Williams wrote: > On Sat, Apr 12, 2008 at 11:16 AM, Bernd Schubert <bernd-schubert@xxxxxx> wrote: > > Hello, > > > > last night we had scsi problems and a hardware raid > > unit was offlined during heavy i/o. While this happened we got for > > about 3 minutes a huge number messages like these > > > > Apr 12 03:36:07 pfs1n14 kernel: [197510.696595] raid5:md7: read error > > not correctable (sector 2993096568 on sdj2). > > > > I guess the high error rate is responsible for not scheduling other > > events - during this time the system was not pingable and in the end > > also other devices run into scsi command timeouts causing problems on > > these unrelated devices as well. > > > > > > Signed-off-by: Bernd Schubert <bernd-schubert@xxxxxx> > > Hi Bernd, > > This patch is whitespace damaged (tabs-->spaces). Can you resend as > an attachment? hmm, don't know how I managed to do that. Probably copied it from the shell... I have attached it this time. I also just added another printk_ratelimit(). Btw, from my point of view the if (printk_ratelimit()) printk("print output"); looks odd. I just don't see why the API isn't printk_ratelimit("print output"); Oh well, modifying this all over the code would give a huge almost useless patch _only_ improving the beauty of code. Thanks, Bernd
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index b162b83..60d3442 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1141,10 +1141,12 @@ static void raid5_end_read_request(struct bio * bi, int error) set_bit(R5_UPTODATE, &sh->dev[i].flags); if (test_bit(R5_ReadError, &sh->dev[i].flags)) { rdev = conf->disks[i].rdev; - printk(KERN_INFO "raid5:%s: read error corrected (%lu sectors at %llu on %s)\n", - mdname(conf->mddev), STRIPE_SECTORS, - (unsigned long long)(sh->sector + rdev->data_offset), - bdevname(rdev->bdev, b)); + if (printk_ratelimit()) + printk(KERN_INFO "raid5:%s: read error corrected" + " (%lu sectors at %llu on %s)\n", + mdname(conf->mddev), STRIPE_SECTORS, + (unsigned long long)(sh->sector + rdev->data_offset), + bdevname(rdev->bdev, b)); clear_bit(R5_ReadError, &sh->dev[i].flags); clear_bit(R5_ReWrite, &sh->dev[i].flags); } @@ -1157,19 +1159,20 @@ static void raid5_end_read_request(struct bio * bi, int error) clear_bit(R5_UPTODATE, &sh->dev[i].flags); atomic_inc(&rdev->read_errors); - if (conf->mddev->degraded) + if (conf->mddev->degraded && printk_ratelimit()) printk(KERN_WARNING "raid5:%s: read error not correctable (sector %llu on %s).\n", mdname(conf->mddev), (unsigned long long)(sh->sector + rdev->data_offset), bdn); - else if (test_bit(R5_ReWrite, &sh->dev[i].flags)) + else if (test_bit(R5_ReWrite, &sh->dev[i].flags) && + printk_ratelimit()) /* Oh, no!!! */ printk(KERN_WARNING "raid5:%s: read error NOT corrected!! (sector %llu on %s).\n", mdname(conf->mddev), (unsigned long long)(sh->sector + rdev->data_offset), bdn); else if (atomic_read(&rdev->read_errors) - > conf->max_nr_stripes) + > conf->max_nr_stripes && printk_ratelimit()) printk(KERN_WARNING "raid5:%s: Too many read errors, failing device %s.\n", mdname(conf->mddev), bdn);