Re: 3.7-rc4 hang with mdadm raid10 near layout, with 4 disks, and an internal bitmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 13 Nov 2012 19:11:31 +0100 Peter Maloney
<peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:

> I am using kernel 3.7-rc4. I have 2 LV on a 4 disk raid10 near layout
> mdadm device which I am trying to copy to another LV on the same VG
> using dd. The mdadm device has an internal bitmap. When I copy the first
> LV, it goes smoothly, but with the 2nd it hangs before it is done.
> 
> I don't know if I can reproduce it, so I'll just leave it broken with a
> bitmap until this is resolved, and work around by using files for what
> I'm trying to do right now. It's my home desktop machine.
> 
> Should I start by installing 3.6.6 or 3.7.0-rc5?
> 
> When I created my raid, I was probably using kernel 3.1 or 3.4.4. When I
> added the bitmap, it was probably 3.7-rc2.
> 
> # uname -a
> Linux peter 3.7.0-rc4-1-default #7 SMP Sun Nov 4 23:11:57 CET 2012
> x86_64 x86_64 x86_64 GNU/Linux
> 

....


Thanks for the report.
Should be fixed by the following.

NeilBrown

Author: NeilBrown <neilb@xxxxxxx>
Date:   Tue Nov 27 12:14:40 2012 +1100

    md/raid1{,0}: fix deadlock in bitmap_unplug.
    
    If the raid1 or raid10 unplug function gets called
    from a make_request function (which is very possible) when
    there are bios on the current->bio_list list, then it will not
    be able to successfully call bitmap_unplug() and it could
    need to submit more bios and wait for them to complete.
    But they won't complete while current->bio_list is non-empty.
    
    So detect that case and handle the unplugging off to another thread
    just like we already do when called from within the scheduler.
    
    RAID1 version of bug was introduced in 3.6, so that part of fix is
    suitable for 3.6.y.  RAID10 part won't apply.
    
    Cc: stable@xxxxxxxxxxxxxxx
    Reported-by: Torsten Kaiser <just.for.lkml@xxxxxxxxxxxxxx>
    Reported-by: Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx>
    Signed-off-by: NeilBrown <neilb@xxxxxxx>

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 636bae0..a0f7309 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -963,7 +963,7 @@ static void raid1_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	struct r1conf *conf = mddev->private;
 	struct bio *bio;
 
-	if (from_schedule) {
+	if (from_schedule || current->bio_list) {
 		spin_lock_irq(&conf->device_lock);
 		bio_list_merge(&conf->pending_bio_list, &plug->pending);
 		conf->pending_count += plug->pending_cnt;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 0d5d0ff..c9acbd7 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1069,7 +1069,7 @@ static void raid10_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	struct r10conf *conf = mddev->private;
 	struct bio *bio;
 
-	if (from_schedule) {
+	if (from_schedule || current->bio_list) {
 		spin_lock_irq(&conf->device_lock);
 		bio_list_merge(&conf->pending_bio_list, &plug->pending);
 		conf->pending_count += plug->pending_cnt;

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux