Re: New RAID causing system lockups

Neil Brown <neilb@xxxxxxx> · Tue, 14 Sep 2010 09:51:11 +1000

On Mon, 13 Sep 2010 11:57:03 -0400
Mike Hartman <mike@xxxxxxxxxxxxxxxxxxxx> wrote:

> >>> I don't know yet what is causing the lock-up.  A quick look at your logs
> >>> suggest that it could be related to the barrier handling.  Maybe trying to
> >>> handle a barrier during a reshape is prone to races of some sort - I wouldn't
> >>> be very surprised by that.
> >>
> >> Just note that during the second lockup no reshape or resync was going
> >> on. The array state was stable, I was just writing to it.
> >>
> >>>
> >>> I'll have a look at the code and see what I can find.
> >>
> >> Thanks a lot. If it was only a risk when I was growing/reshaping the
> >> array, and covered by the backup file, it would just be an
> >> inconvenience. But since it can seemingly happen at any time it's a
> >> problem.
> >>
> >
> > The lockup just happened again. I wasn't doing any
> > growing/reshaping/anything like that. Just copying some data into the
> > partition that lives on md0. dmesg_3.txt has been uploaded alongside
> > the other files at http://www.hartmanipulation.com/raid/. The trace
> > looks pretty similar to me.
> >
> 
> The lockup just happened for the fourth time, less than an hour after
> I rebooted to clear the previous lockup from last night. All I did was
> boot the system, start the RAID, and start copying some files onto it.
> The problem seems to be getting worse - up until now I got at least a
> full day of fairly heavy usage out of the system before it happened.
> dmesg_4.txt has been uploaded alongside the other files. Let me know
> if there's any other system information that would be useful.
> 
> Mike

Hi Mike,
 thanks for the updates.

I'm not entirely clear what is happening (in fact, due to a cold that I am
still fighting off, nothing is entirely clear at the moment), but it looks
very likely that the problem is due to an interplay between barrier handling,
and the multi-level structure of your array (a raid0 being a member of a
raid5).

When a barrier request is processed, both arrays will schedule 'work' to be
done by the 'event' thread and I'm guess that you can get into a situation
where one work time is wait for the other, but the other is behind the one on
the single queue (I wonder if that make sense...)

Anyway, this patch might make a difference,  It reduced the number of work
items schedule in a way that could conceivably fix the problem.

If you can test this, please report the results.  I cannot easily reproduce
the problem so there is limited testing that I can do.

Thanks,
NeilBrown

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f20d13e..7f2785c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
 
 #define POST_REQUEST_BARRIER ((void*)1)
 
+static void md_barrier_done(mddev_t *mddev)
+{
+	struct bio *bio = mddev->barrier;
+
+	if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
+		bio_endio(bio, -EOPNOTSUPP);
+	else if (bio->bi_size == 0)
+		bio_endio(bio, 0);
+	else {
+		/* other options need to be handled from process context */
+		schedule_work(&mddev->barrier_work);
+		return;
+	}
+	mddev->barrier = NULL;
+	wake_up(&mddev->sb_wait);
+}
+
 static void md_end_barrier(struct bio *bio, int err)
 {
 	mdk_rdev_t *rdev = bio->bi_private;
@@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int err)
 			wake_up(&mddev->sb_wait);
 		} else
 			/* The pre-request barrier has finished */
-			schedule_work(&mddev->barrier_work);
+			md_barrier_done(mddev);
 	}
 	bio_put(bio);
 }
@@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_struct *ws)
 
 	atomic_set(&mddev->flush_pending, 1);
 
-	if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
-		bio_endio(bio, -EOPNOTSUPP);
-	else if (bio->bi_size == 0)
-		/* an empty barrier - all done */
-		bio_endio(bio, 0);
-	else {
-		bio->bi_rw &= ~REQ_HARDBARRIER;
-		if (mddev->pers->make_request(mddev, bio))
-			generic_make_request(bio);
-		mddev->barrier = POST_REQUEST_BARRIER;
-		submit_barriers(mddev);
-	}
+	bio->bi_rw &= ~REQ_HARDBARRIER;
+	if (mddev->pers->make_request(mddev, bio))
+		generic_make_request(bio);
+	mddev->barrier = POST_REQUEST_BARRIER;
+	submit_barriers(mddev);
+
 	if (atomic_dec_and_test(&mddev->flush_pending)) {
 		mddev->barrier = NULL;
 		wake_up(&mddev->sb_wait);
@@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct bio *bio)
 	submit_barriers(mddev);
 
 	if (atomic_dec_and_test(&mddev->flush_pending))
-		schedule_work(&mddev->barrier_work);
+		md_barrier_done(mddev);
 }
 EXPORT_SYMBOL(md_barrier_request);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html