On Tue, 20 Jul 2010 19:40:05 +0800 Eddy Zhao <eddy.y.zhao@xxxxxxxxx> wrote: > Hello Neil: > > > We observe periodic write throughput drop of md RAID5. See description below > > Configuration > - linux 2.6.28.9 > - 3 Seagate 320GB 7200rpm SATA2.0 disks > - md RAID5, 3 disks, 256KB chunk > > Test > - open O_DIRECT /dev/md0 > - sequential write, 512KB write block > - refer to "fpt.cpp" ("ulimit -s ulimited" before run the program) > > Problem > - md RAID5 write throughput will drop for 1~2s every 16s (under 1Hz sample > rate) > - refer to "output.txt" > > Do you know the resaon of the problem? We want to fix it on our server to > make the QOS smooth If I'm interpreting your numbers correctly, it is just an occasional single write that is slow - not a series of writes during a one second interval that are each slow. It would help if you could confirm that. Two possibilities occur to me, though it could be something else altogether. You would need to instrument the code to collect internal states to see if it is one of these or something else. 1/ a scheduler problem could be delaying the running of raid5d from time to time so that it either doesn't respond to ready stripes quickly, or cannot get CPU time to perform the xor. 2/ For some reason raid5 sometimes decides that it needs to pre-read the 'other' block to calculate parity rather than waiting for the other block to be written. This is more likely. Either this is bad code somewhere, or the raid5 is being 'unplugged' prematurely. This seems to happen with a period of 30 seconds (I don't know where you got 16 from. The command: tr : ' ' < output.txt | sed 's/ms//' | awk '$4 > 100 {print NR, NR-p; p=NR}' suggests intervals of 1 or 33 seconds being most common, though you could get more precise data out of your program. I suspect this aligns with the 30 second periodic 'flush' that Linux does, though I'm not 100% certain. You could possibly put a 'WARN_ON' in raid5_activate_delayed if delayed_list is not empty. That will give you a stack trace showing why the unplug was called. I'd be keen to hear about any further discoveries you make. BTW I prefer all such questions be post to linux-raid@xxxxxxxxxxxxxxx as others may be able to contribute. I have taken the liberty of cc:ing this reply there. I hope you are OK with that. NeilBrown > > FYI: "Single disk" and "2 disk RAID0" write throughput are all smooth (under > 1Hz sample rate) > > > Thanks > Eddy -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html