Re: performance problems with raid10,f2

Keld Jørn Simonsen <keld@xxxxxxxx> · Fri, 4 Apr 2008 10:03:59 +0200

On Thu, Apr 03, 2008 at 09:20:37PM +0100, Peter Grandi wrote:
> >>> On Wed, 2 Apr 2008 23:13:15 +0200, Keld Jørn Simonsen
> >>> <keld@xxxxxxxx> said:
> 
> [ ... slow RAID reading ... ]
> 
> >> That could be the usual issue with apparent pauses in the
> >> stream of IO requests to the array component devices, with
> >> the usual workaround of trying 'blockdev --setra 65536
> >> /dev/mdN' and see if sequential reads improve.
> 
> keld> Yes, that did it! 
> 
> But that's as usual very wrong. Such a large readhead has
> negative consequences, and most likely is the result of both
> some terrible misdesign in the Linux block IO subsystem (from
> some further experiments it is most likely related to "plugging")
> and integration of MD into it.
> 
> However I have found that on relatively fast machines (I think)
> much lower values of read-ahead still give reasonable speed,
> with some values being much better than others. For example with
> another RAID10 I get pretty decent speed with a read-ahead of
> 128 on '/dev/md0' (but much worse with say 64 or 256). On others
> 1000 sectors read-ahead is good.
> 
> The read-ahead needed also depends a bit on the file system
> type, don't trust tests done on the block device itself.
> 
> So please experiment a bit to try and reduce it, at least until
> I find the time to figure out the (surely embarrasing) reason
> why it is needed and how to avoid it, or the Linux block IO and
> MD maintainers confess (they almost surely already know why)
> and/or fix it already.

I did experiment and I noted that a 16 MiB readahead was sufficient.

And then I was wondering if this had negative consequences, eg on random
reads.

I then had a test with reading 1000 files concurrently, and Some strange
things happened. Each drive was doing about 2000 transactions per
second  (tps). Why? I thought a drive could only do about 150 tps, given
t5hat it is a 7200 rpm drive. 

What is tps measuring?

Why is the fs not reading the chunk size for every IO operation?

Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html