Re: Disk IO, "Paralllel sequential" load: read-ahead inefficient? FS tuning?

Andi Kleen <andi@xxxxxxxxxxxxxx> · Thu, 09 Apr 2009 20:05:03 +0200

"Frantisek Rysanek" <Frantisek.Rysanek@xxxxxxx> writes:

> I don't understand all the tweakable knobs of mkfs.xfs - not well
> enough to match the 4MB RAID chunk size somewhere in the internal
> structure of XFS.

If it's software RAID recent mkfs.xfs should be able to figure
it out the stripe sizes on its own.

> Another problem is, that there seems to be a single tweakable knob to 
> read-ahead in Linux 2.6, accessible in several ways:
>   /sys/block/<dev>/queue/max_sectors_kb
>   /sbin/blockdev --setra
>   /sbin/blockdev --setfra

unsigned long max_sane_readahead(unsigned long nr)
{
        return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE_FILE)
                + node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
}

So you can affect it indirectly by keeping a lot of memory free
with vm.min_free_kbytes. Probably not an optimal solution.

>
> Based on some manpages on the madvise() and fadvise() functions, I'd 
> say that the level of read-ahead corresponding to MADV_SEQUENTIAL and 
> FADV_SEQUENTIAL is still decimal orders less than the desired figure.

Wu Fengguang (cc'ed) is doing a lot of work on the MADV_* readahead
algorithms. There was a recent new patchkit from him on linux-kernel
that you might try. It still uses strict limits, but it's better
at figuring out specific patterns.

But then if you really know very well what kind of readahead
is needed it might be best to just implement it directly in the
applications than to rely on kernel heuristics.

For example for faster booting sys_readahead() is widely used
now.

-Andi
-- 
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html