Re: 2.6.24-rc6 reproducible raid5 hang

"Dan Williams" <dan.j.williams@xxxxxxxxx> · Sat, 29 Dec 2007 13:47:37 -0700

On Dec 29, 2007 9:48 AM, dean gaudet <dean@xxxxxxxxxx> wrote:
> hmm bummer, i'm doing another test (rsync 3.5M inodes from another box) on
> the same 64k chunk array and had raised the stripe_cache_size to 1024...
> and got a hang.  this time i grabbed stripe_cache_active before bumping
> the size again -- it was only 905 active.  as i recall the bug we were
> debugging a year+ ago the active was at the size when it would hang.  so
> this is probably something new.

I believe I am seeing the same issue and am trying to track down
whether XFS is doing something unexpected, i.e. I have not been able
to reproduce the problem with EXT3.  MD tries to increase throughput
by letting some stripe work build up in batches.  It looks like every
time your system has hung it has been in the 'inactive_blocked' state
i.e. > 3/4 of stripes active.  This state should automatically
clear...

>
> anyhow raising it to 2048 got it unstuck, but i'm guessing i'll be able to
> hit that limit too if i try harder :)

Once you hang if 'stripe_cache_size' is increased such that
stripe_cache_active < 3/4 * stripe_cache_size things will start
flowing again.

>
> btw what units are stripe_cache_size/active in?  is the memory consumed
> equal to (chunk_size * raid_disks * stripe_cache_size) or (chunk_size *
> raid_disks * stripe_cache_active)?
>

memory_consumed = PAGE_SIZE * raid_disks * stripe_cache_size

>
> -dean
>

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html