Hi Neil, after reading the code of raid1.c, I see that there's also conf->retry_list, which is also flushed by raid1d, but not by flush_pending_writes(). So I think it can also cause similar deadlock, but I don't know how to fix it:( Alex. On Thu, Jun 6, 2013 at 6:00 PM, Tregaron Bayly <tbayly@xxxxxxxxxxxx> wrote: > On 06/03/2013 07:49 PM, NeilBrown wrote: >> >> On Sun, 2 Jun 2013 15:43:41 +0300 Alexander Lyakas >> <alex.bolshoy@xxxxxxxxx> >> wrote: >> >>> Hello Neil, >>> I believe I have found what is causing the deadlock. It happens in two >>> flavors: >>> >>> 1) >>> # raid1d() is called, and conf->pending_bio_list is non-empty at this >>> point >>> # raid1d() calls md_check_recovery(), which eventually calls >>> raid1_add_disk(), which calls raise_barrier() >>> # Now raise_barrier will wait for conf->nr_pending to become 0, but it >>> cannot become 0, because there are bios sitting in >>> conf->pending_bio_list, which nobody will flush, because raid1d is the >>> one supposed to call flush_pending_writes(), either directly or via >>> handle_read_error. But it is stuck in raise_barrier. >>> >>> 2) >>> # raid1_add_disk() calls raise_barrier(), and waits for >>> conf->nr_pending to become 0, as before >>> # new WRITE comes and calls wait_barrier(), but this thread has a >>> non-empty current->bio_list >>> # In this case, the code allows the WRITE to go through >>> wait_barrier(), and trigger WRITEs to mirror legs, but these WRITEs >>> again end up in conf->pending_bio_list (either via raid1_unplug or >>> directly). But nobody will flush conf->pending_bio_list, because >>> raid1d is stuck in raise_barrier. >>> >>> Previously, for example in kernel 3.2, raid1_add_disk did not call >>> raise_barrier, so this problem did not happen. >>> >>> Attached is a reproduction with some prints that I added to >>> raise_barrier and wait_barrier (their code also attached). It >>> demonstrates case 2. It shows that once raise_barrier got called, >>> conf->nr_pending drops down, until it equals the number of >>> wait_barrier calls, that slipped through because of non-empty >>> current->bio_list. And at this point, this array hangs. >>> >>> Can you please comment on how to fix this problem. It looks like a >>> real deadlock. >>> We can perhaps call md_check_recovery() after flush_pending_writes(), >>> but this only makes the window smaller, not closes it entirely. But it >>> looks like we really should not be calling raise_barrier from raid1d. >>> >>> Thanks, >>> Alex. >> >> >> Hi Alex, >> thanks for the analysis. >> >> Does the following patch fix it? It makes raise_barrier more like >> freeze_array(). >> If not, could you try making the same change to the first >> wait_event_lock_irq in raise_barrier? >> >> Thanks. >> NeilBrown >> >> >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 328fa2d..d34f892 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -828,9 +828,9 @@ static void raise_barrier(struct r1conf *conf) >> conf->barrier++; >> >> /* Now wait for all pending IO to complete */ >> - wait_event_lock_irq(conf->wait_barrier, >> - !conf->nr_pending && conf->barrier < >> RESYNC_DEPTH, >> - conf->resync_lock); >> + wait_event_lock_irq_cmd(conf->wait_barrier, >> + !conf->nr_pending && conf->barrier < >> RESYNC_DEPTH, >> + conf->resync_lock, >> flush_pending_writes(conf)); >> >> spin_unlock_irq(&conf->resync_lock); >> } >> > > Neil, > > This deadlock also cropped up in 3.4 between .37 and .38. Passing > flush_pending_writes(conf) as cmd to wait_event_lock_irq seems to fix it > there as well. > > --- linux-3.4.38/drivers/md/raid1.c 2013-03-28 13:12:41.000000000 -0600 > +++ linux-3.4.38.patch/drivers/md/raid1.c 2013-06-04 > 12:17:35.314194903 -0600 > @@ -751,7 +751,7 @@ > > /* Now wait for all pending IO to complete */ > wait_event_lock_irq(conf->wait_barrier, > > !conf->nr_pending && conf->barrier < > RESYNC_DEPTH, > - conf->resync_lock, ); > > + conf->resync_lock, flush_pending_writes(conf)); > spin_unlock_irq(&conf->resync_lock); > } > > Thanks, > > Tregaron -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html