Re: Raid5 hang in 3.14.19

BillStuff <billstuff2001@xxxxxxxxxxxxx> · Mon, 29 Sep 2014 23:19:51 -0500

On 09/29/2014 04:59 PM, NeilBrown wrote:
On Sun, 28 Sep 2014 23:28:17 -0500 BillStuff <billstuff2001@xxxxxxxxxxxxx>
wrote:

On 09/28/2014 11:08 PM, NeilBrown wrote:
On Sun, 28 Sep 2014 22:56:19 -0500 BillStuff <billstuff2001@xxxxxxxxxxxxx>
wrote:

On 09/28/2014 09:25 PM, NeilBrown wrote:
On Fri, 26 Sep 2014 17:33:58 -0500 BillStuff <billstuff2001@xxxxxxxxxxxxx>
wrote:

Hi Neil,

I found something that looks similar to the problem described in
"Re: seems like a deadlock in workqueue when md do a flush" from Sept 14th.

It's on 3.14.19 with 7 recent patches for fixing raid1 recovery hangs.

on this array:
md3 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
          104171200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
          bitmap: 1/5 pages [4KB], 2048KB chunk

I was running a test doing parallel kernel builds, read/write loops, and
disk add / remove / check loops,
on both this array and a raid1 array.

I was trying to stress test your recent raid1 fixes, which went well,
but then after 5 days,
the raid5 array hung up with this in dmesg:
I think this is different to the workqueue problem you mentioned, though as I
don't know exactly what caused either I cannot be certain.

    From the data you provided it looks like everything is waiting on
get_active_stripe(), or on a process that is waiting on that.
That seems pretty common whenever anything goes wrong in raid5 :-(

The md3_raid5 task is listed as blocked, but not stack trace is given.
If the machine is still in the state, then

    cat /proc/1698/stack

might be useful.
(echo t > /proc/sysrq-trigger is always a good idea)
Might this help? I believe the array was doing a "check" when things
hung up.
It looks like it was trying to start doing a 'check'.
The 'resync' thread hadn't been started yet.
What is 'kthreadd' doing?
My guess is that it is in try_to_free_pages() waiting for writeout
for some xfs file page onto the md array ... which won't progress until
the thread gets started.

That would suggest that we need an async way to start threads...

Thanks,
NeilBrown

I suspect your guess is correct:
Thanks for the confirmation.

I'm thinking of something like that.  Very basic suggestion suggests it
instantly crash.

If you were to apply this patch and run your test for a week or two,  that
would increase my confidence (though of course testing doesn't prove the
absence of bugs....)

Thanks,
NeilBrown

Got it running. I'll let you know if anything interesting happens.

Thanks,
Bill

diff --git a/drivers/md/md.c b/drivers/md/md.c
index a79e51d15c2b..580d4b97696c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7770,6 +7770,33 @@ no_add:
  	return spares;
  }
  
+static void md_start_sync(struct work_struct *ws)
+{
+	struct mddev *mddev = container_of(ws, struct mddev, del_work);
+
+	mddev->sync_thread = md_register_thread(md_do_sync,
+						mddev,
+						"resync");
+	if (!mddev->sync_thread) {
+		printk(KERN_ERR "%s: could not start resync"
+		       " thread...\n",
+		       mdname(mddev));
+		/* leave the spares where they are, it shouldn't hurt */
+		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+		clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
+		clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
+		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+		clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+		if (test_and_clear_bit(MD_RECOVERY_RECOVER,
+				       &mddev->recovery))
+			if (mddev->sysfs_action)
+				sysfs_notify_dirent_safe(mddev->sysfs_action);
+	} else
+		md_wakeup_thread(mddev->sync_thread);
+	sysfs_notify_dirent_safe(mddev->sysfs_action);
+	md_new_event(mddev);
+}
+
  /*
   * This routine is regularly called by all per-raid-array threads to
   * deal with generic issues like resync and super-block update.
@@ -7823,6 +7850,7 @@ void md_check_recovery(struct mddev *mddev)
  
  	if (mddev_trylock(mddev)) {
  		int spares = 0;
+		bool sync_starting = false;
  
  		if (mddev->ro) {
  			/* On a read-only array we can:
@@ -7921,28 +7949,14 @@ void md_check_recovery(struct mddev *mddev)
  				 */
  				bitmap_write_all(mddev->bitmap);
  			}
-			mddev->sync_thread = md_register_thread(md_do_sync,
-								mddev,
-								"resync");
-			if (!mddev->sync_thread) {
-				printk(KERN_ERR "%s: could not start resync"
-					" thread...\n",
-					mdname(mddev));
-				/* leave the spares where they are, it shouldn't hurt */
-				clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-				clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
-				clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-				clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
-				clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
-			} else
-				md_wakeup_thread(mddev->sync_thread);
-			sysfs_notify_dirent_safe(mddev->sysfs_action);
-			md_new_event(mddev);
+			INIT_WORK(&mddev->del_work, md_start_sync);
+			queue_work(md_misc_wq, &mddev->del_work);
+			sync_starting = true;
  		}
  	unlock:
  		wake_up(&mddev->sb_wait);
  
-		if (!mddev->sync_thread) {
+		if (!mddev->sync_thread && !sync_starting) {
  			clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
  			if (test_and_clear_bit(MD_RECOVERY_RECOVER,
  					       &mddev->recovery))


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html