Re: task mdadm blocked when stopping array, 3.15rc3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 3 May 2014 17:16:18 -0600 Chris Murphy <lists@xxxxxxxxxxxxxxxxx>
wrote:

> When I issue mdadm -S /dev/md0, I get a hang which does not recover after 30+ minutes. This is what appears in dmesg (partial), but I also have issued sysrq-w and included a followup dmesg and journalctl both of which are attached to this kernel bug because it's so wide it just looks ugly in email:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=75451

Thanks for the report.
Patch below should  fix it.  I'll send it upstream shortly.

I don't think the systemd-udevd messages are relevant.... I wonder what they
mean though.

NeilBrown

From bbba3bc5932a56fdaeecfda87597c1cac5d84803 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@xxxxxxx>
Date: Mon, 5 May 2014 13:34:37 +1000
Subject: [PATCH] md/raid10: call wait_barrier() for each request submitted.

wait_barrier() includes a counter, so we must call it precisely once
(unless balanced by allow_barrier()) for each request submitted.

Since
commit 20d0189b1012a37d2533a87fb451f7852f2418d1
    block: Introduce new bio_split()
in 3.14-rc1, we don't call it for the extra requests generated when
we need to split a bio.

When this happens the counter goes negative, any resync/recovery will
never start, and  "mdadm --stop" will hang.

Reported-by: Chris Murphy <lists@xxxxxxxxxxxxxxxxx>
Fixes: 20d0189b1012a37d2533a87fb451f7852f2418d1
Cc: stable@xxxxxxxxxxxxxxx (3.14+)
Cc: Kent Overstreet <kmo@xxxxxxxxxxxxx>
Signed-off-by: NeilBrown <neilb@xxxxxxx>

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 33fc408e5eac..cb882aae9e20 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1172,6 +1172,13 @@ static void __make_request(struct mddev *mddev, struct bio *bio)
 	int max_sectors;
 	int sectors;
 
+	/*
+	 * Register the new request and wait if the reconstruction
+	 * thread has put up a bar for new requests.
+	 * Continue immediately if no resync is active currently.
+	 */
+	wait_barrier(conf);
+
 	sectors = bio_sectors(bio);
 	while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
 	    bio->bi_iter.bi_sector < conf->reshape_progress &&
@@ -1552,12 +1559,6 @@ static void make_request(struct mddev *mddev, struct bio *bio)
 
 	md_write_start(mddev, bio);
 
-	/*
-	 * Register the new request and wait if the reconstruction
-	 * thread has put up a bar for new requests.
-	 * Continue immediately if no resync is active currently.
-	 */
-	wait_barrier(conf);
 
 	do {
 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux