Re: raid5 hang on get_active_stripe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Saturday May 27, dean@xxxxxxxxxx wrote:
> On Sat, 27 May 2006, Neil Brown wrote:
> 
> > Thanks.  This narrows it down quite a bit... too much infact:  I can
> > now say for sure that this cannot possible happen :-)
> > 
> >   2/ The message.gz you sent earlier with the
> >           echo t > /proc/sysrq-trigger
> >      trace in it didn't contain information about md4_raid5 - the 
> 
> got another hang again this morning... full dmesg output attached.
> 

Thanks.  Nothing surprising there, which maybe is a surprise itself...

I'm still somewhat stumped by this.  But given that it is nicely
repeatable, I'm sure we can get there...

The following patch adds some more tracing to raid5, and might fix a
subtle bug in ll_rw_blk, though it is an incredible long shot that
this could be affecting raid5 (if it is, I'll have to assume there is
another bug somewhere).   It certainly doesn't break ll_rw_blk.
Whether it actually fixes something I'm not sure.

If you could try with these on top of the previous patches I'd really
appreciate it.

When you read from ..../stripe_cache_active, it should trigger a
(cryptic) kernel message within the next 15 seconds.  If I could get
the contents of that file and the kernel messages, that should help.

Thanks heaps,

NeilBrown


Signed-off-by: Neil Brown <neilb@xxxxxxx>

### Diffstat output
 ./block/ll_rw_blk.c  |    4 ++--
 ./drivers/md/raid5.c |   18 ++++++++++++++++++
 2 files changed, 20 insertions(+), 2 deletions(-)

diff ./block/ll_rw_blk.c~current~ ./block/ll_rw_blk.c
--- ./block/ll_rw_blk.c~current~	2006-05-28 21:54:23.000000000 +1000
+++ ./block/ll_rw_blk.c	2006-05-28 21:55:17.000000000 +1000
@@ -874,7 +874,7 @@ static void __blk_queue_free_tags(reques
 	}
 
 	q->queue_tags = NULL;
-	q->queue_flags &= ~(1 << QUEUE_FLAG_QUEUED);
+	clear_bit(QUEUE_FLAG_QUEUED, &q->queue_flags);
 }
 
 /**
@@ -963,7 +963,7 @@ int blk_queue_init_tags(request_queue_t 
 	 * assign it, all done
 	 */
 	q->queue_tags = tags;
-	q->queue_flags |= (1 << QUEUE_FLAG_QUEUED);
+	set_bit(QUEUE_FLAG_QUEUED, &q->queue_flags);
 	return 0;
 fail:
 	kfree(tags);

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2006-05-27 09:17:10.000000000 +1000
+++ ./drivers/md/raid5.c	2006-05-28 21:56:56.000000000 +1000
@@ -1701,13 +1701,20 @@ static sector_t sync_request(mddev_t *md
  * During the scan, completed stripes are saved for us by the interrupt
  * handler, so that they will not have to wait for our next wakeup.
  */
+static unsigned long trigger;
+
 static void raid5d (mddev_t *mddev)
 {
 	struct stripe_head *sh;
 	raid5_conf_t *conf = mddev_to_conf(mddev);
 	int handled;
+	int trace = 0;
 
 	PRINTK("+++ raid5d active\n");
+	if (test_and_clear_bit(0, &trigger))
+		trace = 1;
+	if (trace)
+		printk("raid5d runs\n");
 
 	md_check_recovery(mddev);
 
@@ -1725,6 +1732,13 @@ static void raid5d (mddev_t *mddev)
 			activate_bit_delay(conf);
 		}
 
+		if (trace)
+			printk(" le=%d, pas=%d, bqp=%d le=%d\n",
+			       list_empty(&conf->handle_list),
+			       atomic_read(&conf->preread_active_stripes),
+			       blk_queue_plugged(mddev->queue),
+			       list_empty(&conf->delayed_list));
+
 		if (list_empty(&conf->handle_list) &&
 		    atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD &&
 		    !blk_queue_plugged(mddev->queue) &&
@@ -1756,6 +1770,8 @@ static void raid5d (mddev_t *mddev)
 	unplug_slaves(mddev);
 
 	PRINTK("--- raid5d inactive\n");
+	if (trace)
+		printk("raid5d done\n");
 }
 
 static ssize_t
@@ -1813,6 +1829,7 @@ stripe_cache_active_show(mddev_t *mddev,
 		struct list_head *l;
 		n = sprintf(page, "%d\n", atomic_read(&conf->active_stripes));
 		n += sprintf(page+n, "%d preread\n", atomic_read(&conf->preread_active_stripes));
+		n += sprintf(page+n, "%splugged\n", blk_queue_plugged(mddev->queue)?"":"not ");
 		spin_lock_irq(&conf->device_lock);
 		c1=0;
 		list_for_each(l, &conf->bitmap_list)
@@ -1822,6 +1839,7 @@ stripe_cache_active_show(mddev_t *mddev,
 			c2++;
 		spin_unlock_irq(&conf->device_lock);
 		n += sprintf(page+n, "bitlist=%d delaylist=%d\n", c1, c2);
+		trigger = 0xffff;
 		return n;
 	} else
 		return 0;
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux