On Saturday May 27, dean@xxxxxxxxxx wrote: > On Sat, 27 May 2006, Neil Brown wrote: > > > Thanks. This narrows it down quite a bit... too much infact: I can > > now say for sure that this cannot possible happen :-) > > > > 2/ The message.gz you sent earlier with the > > echo t > /proc/sysrq-trigger > > trace in it didn't contain information about md4_raid5 - the > > got another hang again this morning... full dmesg output attached. > Thanks. Nothing surprising there, which maybe is a surprise itself... I'm still somewhat stumped by this. But given that it is nicely repeatable, I'm sure we can get there... The following patch adds some more tracing to raid5, and might fix a subtle bug in ll_rw_blk, though it is an incredible long shot that this could be affecting raid5 (if it is, I'll have to assume there is another bug somewhere). It certainly doesn't break ll_rw_blk. Whether it actually fixes something I'm not sure. If you could try with these on top of the previous patches I'd really appreciate it. When you read from ..../stripe_cache_active, it should trigger a (cryptic) kernel message within the next 15 seconds. If I could get the contents of that file and the kernel messages, that should help. Thanks heaps, NeilBrown Signed-off-by: Neil Brown <neilb@xxxxxxx> ### Diffstat output ./block/ll_rw_blk.c | 4 ++-- ./drivers/md/raid5.c | 18 ++++++++++++++++++ 2 files changed, 20 insertions(+), 2 deletions(-) diff ./block/ll_rw_blk.c~current~ ./block/ll_rw_blk.c --- ./block/ll_rw_blk.c~current~ 2006-05-28 21:54:23.000000000 +1000 +++ ./block/ll_rw_blk.c 2006-05-28 21:55:17.000000000 +1000 @@ -874,7 +874,7 @@ static void __blk_queue_free_tags(reques } q->queue_tags = NULL; - q->queue_flags &= ~(1 << QUEUE_FLAG_QUEUED); + clear_bit(QUEUE_FLAG_QUEUED, &q->queue_flags); } /** @@ -963,7 +963,7 @@ int blk_queue_init_tags(request_queue_t * assign it, all done */ q->queue_tags = tags; - q->queue_flags |= (1 << QUEUE_FLAG_QUEUED); + set_bit(QUEUE_FLAG_QUEUED, &q->queue_flags); return 0; fail: kfree(tags); diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c --- ./drivers/md/raid5.c~current~ 2006-05-27 09:17:10.000000000 +1000 +++ ./drivers/md/raid5.c 2006-05-28 21:56:56.000000000 +1000 @@ -1701,13 +1701,20 @@ static sector_t sync_request(mddev_t *md * During the scan, completed stripes are saved for us by the interrupt * handler, so that they will not have to wait for our next wakeup. */ +static unsigned long trigger; + static void raid5d (mddev_t *mddev) { struct stripe_head *sh; raid5_conf_t *conf = mddev_to_conf(mddev); int handled; + int trace = 0; PRINTK("+++ raid5d active\n"); + if (test_and_clear_bit(0, &trigger)) + trace = 1; + if (trace) + printk("raid5d runs\n"); md_check_recovery(mddev); @@ -1725,6 +1732,13 @@ static void raid5d (mddev_t *mddev) activate_bit_delay(conf); } + if (trace) + printk(" le=%d, pas=%d, bqp=%d le=%d\n", + list_empty(&conf->handle_list), + atomic_read(&conf->preread_active_stripes), + blk_queue_plugged(mddev->queue), + list_empty(&conf->delayed_list)); + if (list_empty(&conf->handle_list) && atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD && !blk_queue_plugged(mddev->queue) && @@ -1756,6 +1770,8 @@ static void raid5d (mddev_t *mddev) unplug_slaves(mddev); PRINTK("--- raid5d inactive\n"); + if (trace) + printk("raid5d done\n"); } static ssize_t @@ -1813,6 +1829,7 @@ stripe_cache_active_show(mddev_t *mddev, struct list_head *l; n = sprintf(page, "%d\n", atomic_read(&conf->active_stripes)); n += sprintf(page+n, "%d preread\n", atomic_read(&conf->preread_active_stripes)); + n += sprintf(page+n, "%splugged\n", blk_queue_plugged(mddev->queue)?"":"not "); spin_lock_irq(&conf->device_lock); c1=0; list_for_each(l, &conf->bitmap_list) @@ -1822,6 +1839,7 @@ stripe_cache_active_show(mddev_t *mddev, c2++; spin_unlock_irq(&conf->device_lock); n += sprintf(page+n, "bitlist=%d delaylist=%d\n", c1, c2); + trigger = 0xffff; return n; } else return 0; - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html