On Tue, 16 Sep 2014 11:31:26 -0500 Brassow Jonathan <jbrassow@xxxxxxxxxx> wrote: > > On Sep 14, 2014, at 10:30 PM, NeilBrown wrote: > > > On Thu, 11 Sep 2014 12:12:01 -0500 Brassow Jonathan <jbrassow@xxxxxxxxxx> > > wrote: > > > >> > >> On Sep 10, 2014, at 10:45 PM, Brassow Jonathan wrote: > >> > >>> > >>> On Sep 10, 2014, at 1:20 AM, NeilBrown wrote: > >>> > >>>> > >>>> Jon: could you test with these patches on top of what you > >>>> have just in case something happens to fix the problem without > >>>> me realising it? > >>> > >>> I'm on it. The test is running. I'll know later tomorrow. > >>> > >>> brassow > >> > >> The test is still failing from here. I grabbed 3.17.0-rc4, added the 5 patches, and got the attached backtraces when testing. As I said, the hangs are not exactly the same. This set shows the mdX_raid1 thread in the middle of handling a read failure. > > > > Thanks. > > mdX_raid1 is blocked in freeze_array. > > That could be caused by conf->nr_pending nor aligning properly with > > conf->nr_queued. > > > > Both normal IO and resync IO can be retried with reschedule_retry() > > and so be counted into ->nr_queued, but only normal IO gets counted in > > ->nr_pending. > > > > Previously could could only possibly have on or the other and when handling > > a read failure it could only be normal IO. But now that they two types can > > interleave, we can have both normal and resync IO requests queued, so we need > > to count them both in nr_pending. > > > > So the following patch might help. > > > > How complicated are your test scripts? Could you send them to me so I can > > try too? > > > > Thanks, > > NeilBrown > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > index 888dbdfb6986..6a9c73435eb8 100644 > > --- a/drivers/md/raid1.c > > +++ b/drivers/md/raid1.c > > @@ -856,6 +856,7 @@ static void raise_barrier(struct r1conf *conf, sector_t sector_nr) > > conf->next_resync + RESYNC_SECTORS), > > conf->resync_lock); > > > > + conf->nr_pending++; > > spin_unlock_irq(&conf->resync_lock); > > } > > > > @@ -865,6 +866,7 @@ static void lower_barrier(struct r1conf *conf) > > BUG_ON(conf->barrier <= 0); > > spin_lock_irqsave(&conf->resync_lock, flags); > > conf->barrier--; > > + conf->nr_pending--; > > spin_unlock_irqrestore(&conf->resync_lock, flags); > > wake_up(&conf->wait_barrier); > > } > > No luck, it is failing faster than before. > > I haven't looked into this myself, but the dm-raid1.c code makes use of dm-region-hash.c which coordinates recovery and nominal I/O in a way that allows them to both occur in a simple, non-overlapping way. I'm not sure it would make sense to use that instead of this new approach. I have no idea how much effort that would be, but I could have someone look into it at some point if you think it might be interesting. > Hi Jon, I can see the appeal of using known-working code, but there is every chance that we would break it when plugging it into md ;-) I've found another bug.... it is a very subtle one and it has been around since before the patch you bisected to so it probably isn't your bug. It also only affects array with bad-blocks listed. The patch is below but I very much doubt testing will show any change... I'll keep looking..... oh, found one. This one looks more convincing. If memory is short, make_request() will allocate an r1bio from the mempool rather than from the slab. That r1bio won't have just been zeroed. This is mostly OK as we initialise all the fields that aren't left in a clean state ... except ->start_next_window. We initialise that for write requests, but not for read. So when we use a mempool-allocated r1bio that was previously used for write and had ->start_next_window set, and is now used for read, then things will go wrong. So this patch definitely is worth testing. Thanks for your continued patience in testing!!! Thanks, NeilBrown diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index a95f9e179e6f..7187d9b8431f 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1185,6 +1185,7 @@ read_again: atomic_read(&bitmap->behind_writes) == 0); } r1_bio->read_disk = rdisk; + r1_bio->start_next_window = 0; read_bio = bio_clone_mddev(bio, GFP_NOIO, mddev); bio_trim(read_bio, r1_bio->sector - bio->bi_iter.bi_sector, @@ -1444,6 +1445,7 @@ read_again: r1_bio->state = 0; r1_bio->mddev = mddev; r1_bio->sector = bio->bi_iter.bi_sector + sectors_handled; + start_next_window = wait_barrier(conf, bio); goto retry_write; }
Attachment:
signature.asc
Description: PGP signature