On Sat, 29 Dec 2007, Dan Williams wrote: > On Dec 29, 2007 9:48 AM, dean gaudet <dean@xxxxxxxxxx> wrote: > > hmm bummer, i'm doing another test (rsync 3.5M inodes from another box) on > > the same 64k chunk array and had raised the stripe_cache_size to 1024... > > and got a hang. this time i grabbed stripe_cache_active before bumping > > the size again -- it was only 905 active. as i recall the bug we were > > debugging a year+ ago the active was at the size when it would hang. so > > this is probably something new. > > I believe I am seeing the same issue and am trying to track down > whether XFS is doing something unexpected, i.e. I have not been able > to reproduce the problem with EXT3. MD tries to increase throughput > by letting some stripe work build up in batches. It looks like every > time your system has hung it has been in the 'inactive_blocked' state > i.e. > 3/4 of stripes active. This state should automatically > clear... cool, glad you can reproduce it :) i have a bit more data... i'm seeing the same problem on debian's 2.6.22-3-amd64 kernel, so it's not new in 2.6.24. i'm doing some more isolation but just grabbing kernels i have precompiled so far -- a 2.6.19.7 kernel doesn't show the problem, and early indications are a 2.6.21.7 kernel also doesn't have the problem but i'm giving it longer to show its head. i'll try a stock 2.6.22 next depending on how the 2.6.21 test goes, just so we get the debian patches out of the way. i was tempted to blame async api because it's newish :) but according to the dmesg output it doesn't appear the 2.6.22-3-amd64 kernel used async API, and it still hung, so async is probably not to blame. anyhow the test case i'm using is the dma_thrasher script i attached... it takes about an hour to give me confidence there's no problems so this will take a while. -dean - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html