Hello Dan ,
On Mon, 27 Aug 2007, Dan Williams wrote:
On 8/25/07, Mr. James W. Laferriere <babydr@xxxxxxxxxxxxxxxx> wrote:
On Mon, 20 Aug 2007, Dan Williams wrote:
On 8/18/07, Mr. James W. Laferriere <babydr@xxxxxxxxxxxxxxxx> wrote:
Hello All , Here we go again . Again attempting to do bonnie++ testing
on a small array .
Kernel 2.6.22.1
Patches involved ,
IOP1 , 2.6.22.1-iop1 for improved sequential write performance
(stripe-queue) , Dan Williams <dan.j.williams@xxxxxxxxx>
Hello James,
Thanks for the report.
I tried to reproduce this on my system, no luck.
Possibly because there is significant hardware differances ?
See 'lspci -v' below .sig .
However it looks
like their is a potential race between 'handle_queue' and
'add_queue_bio'. The attached patch moves these critical sections
under spin_lock(&sq->lock), and adds some debugging output if this BUG
triggers. It also includes a fix for retry_aligned_read which is
unrelated to this debug.
--
Dan
Applied your patch . The same 'kernel BUG at drivers/md/raid5.c:3689!'
messages appear (see attached) . The system is still responsive with your
patch , the kernel crashed last time . Tho the bonnie++ run is stuck in 'D' .
And doing a '> /md3/asdf' stays hung even after passing the parent process a
'kill -9' .
Any further info You can think of I can/should , I will try to acquire
. But I'll have to repeat these steps to attempt to get the same results .
I'll be shutting the system down after sending this off .
Fyi , the previous 'BUG" without your patch was quite repeatable .
I might have time over the next couple of weeks to be able to see if it
is as repatable as the last one .
Contents of /proc/mdstat for md3 .
md3 : active raid6 sdx1[3] sdw1[2] sdv1[1] sdu1[0] sdt1[7](S) sds1[6] sdr1[5] sdq1[4]
717378560 blocks level 6, 1024k chunk, algorithm 2 [7/7] [UUUUUUU]
bitmap: 2/137 pages [8KB], 512KB chunk
Commands I ran that lead to the 'BUG' .
bonniemd3() { /root/bonnie++-1.03a/bonnie++ -u0:0 -d /md3 -s 131072 -f; }
bonniemd3 > 131072MB-bonnie++-run-md3-xfs.log-20070825 2>&1 &
Ok, the 'bitmap' and 'raid6' details were the missing pieces of my
testing. I can now reproduce this bug in handle_queue. I'll keep you
posted on what I find.
Thank you for tracking this.
Regards,
You said to watch here & I have .
Is there any hope of digging this out ?
Anything further I can provide ? Please just say so .
Tia , JimL
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html