Re: LVM on raid10,f2 performance issues

Keld Jørn Simonsen <keld@xxxxxxxx> · Mon, 19 Jan 2009 14:59:38 +0100

On Mon, Jan 19, 2009 at 01:24:39PM +0100, Peter Rabbitson wrote:
> Keld Jørn Simonsen wrote:
> > Hmm, 
> > 
> > Why is the command
> > 
> >  blockdev --setra 65536 /dev/md0
> > 
> > really needed? I think the kernel should set a reasonable default here.
> 
> The in-kernel default for a block device is 256 (128k) which is way too
> low. the MD subsystems tries to be a bit smarter and assigns the md
> device readahead according to the number of devices/raid level. For
> streaming (i.e. file sever) these values are also too low. LVs can take
> a readahead specification at creation time and use that, but this is
> manual.

I would like to have something done automatically in the kernel, so that
you do not need to do it manually. People tend to not know that you need
to add the blockdev statement, eg in /etc/rc.local, to get decent
performance. And this is needed even for simpler arrays, such as a 4
drive raid10,f2 , which can be set up on many recent motherboards with 
sata-II support directly off the mobo.

> It is arguable what the typical workload is, but I would lean towards
> big long linear reads (fileserver) vs short scattered ones (database).

My understanding is that the readahead is only done when the kernel
thinks it is doing sequential reads. his prpbalu is not the case whan
doing database operations. So we are kind of safe here, IMHO.
> 
> The real solution to the problem was proposed a long time ago, and it
> seems it got lost in the attic: http://lwn.net/Articles/155510/

Yes, interesting.

The patch may nt be ready for inclusion for some time due to complexity
and lack of testing.

So I am wondering if we could come up with a formula to set the readahead
for raid. It seems like a big readahead would not affect random reading.
It would then only be overkill for sequential reading of smallish files.

So how does the kernel detect that it is doing sequential reading?
Maybe it detects that the new block to read or a specific file
descriptor is the follower to the previous read on the same FD?

And then we normally read a full chunk for the raid, which is at least
something like 64 KiB? This would take care of most database
transactions. 

I would think we then should find the smallest readahead value for a
given array, from chunk size and drive count, that gets the array to
perform as expected.

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html