On Sat, 3 May 2014 17:16:18 -0600 Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote: > When I issue mdadm -S /dev/md0, I get a hang which does not recover after 30+ minutes. This is what appears in dmesg (partial), but I also have issued sysrq-w and included a followup dmesg and journalctl both of which are attached to this kernel bug because it's so wide it just looks ugly in email: > > https://bugzilla.kernel.org/show_bug.cgi?id=75451 Thanks for the report. Patch below should fix it. I'll send it upstream shortly. I don't think the systemd-udevd messages are relevant.... I wonder what they mean though. NeilBrown From bbba3bc5932a56fdaeecfda87597c1cac5d84803 Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@xxxxxxx> Date: Mon, 5 May 2014 13:34:37 +1000 Subject: [PATCH] md/raid10: call wait_barrier() for each request submitted. wait_barrier() includes a counter, so we must call it precisely once (unless balanced by allow_barrier()) for each request submitted. Since commit 20d0189b1012a37d2533a87fb451f7852f2418d1 block: Introduce new bio_split() in 3.14-rc1, we don't call it for the extra requests generated when we need to split a bio. When this happens the counter goes negative, any resync/recovery will never start, and "mdadm --stop" will hang. Reported-by: Chris Murphy <lists@xxxxxxxxxxxxxxxxx> Fixes: 20d0189b1012a37d2533a87fb451f7852f2418d1 Cc: stable@xxxxxxxxxxxxxxx (3.14+) Cc: Kent Overstreet <kmo@xxxxxxxxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 33fc408e5eac..cb882aae9e20 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1172,6 +1172,13 @@ static void __make_request(struct mddev *mddev, struct bio *bio) int max_sectors; int sectors; + /* + * Register the new request and wait if the reconstruction + * thread has put up a bar for new requests. + * Continue immediately if no resync is active currently. + */ + wait_barrier(conf); + sectors = bio_sectors(bio); while (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) && bio->bi_iter.bi_sector < conf->reshape_progress && @@ -1552,12 +1559,6 @@ static void make_request(struct mddev *mddev, struct bio *bio) md_write_start(mddev, bio); - /* - * Register the new request and wait if the reconstruction - * thread has put up a bar for new requests. - * Continue immediately if no resync is active currently. - */ - wait_barrier(conf); do {
Attachment:
signature.asc
Description: PGP signature