The future of readahead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Both Kent and David have had conversations with me about improving the
readahead filesystem interface this last week, and as I don't have time
to write the code, here's the design.

1. Kent doesn't like it that we do an XArray lookup for each page.
The proposed solution adds a (small) array of page pointers (or a
pagevec) to the struct readahead_control.  It may make sense to move
__readahead_batch() and readahead_page() out of line at that point.
This should be backed up with performance numbers.

2. David wants to be sure that readahead is aligned to a granule
size (eg 256kB) to support fscache.  When we last talked about it,
I suggested encoding the granule size in the struct address_space.
I no longer think this approach should be pursued, since ...

3. Kent wants to be able to expand readahead to encompass an entire fs
extent (if, eg, that extent is compressed or encrypted).  We don't know
that at the right point; the filesystem can't pass that information
through the generic_file_buffered_read() or filemap_fault() interface
to the readahead code.  So the right approach here is for the filesystem
to ask the readahead code to expand the readahead batch.

So solving #2 and #3 looks like a new interface for filesystems to call:

void readahead_expand(struct readahead_control *rac, loff_t start, u64 len);
or possibly
void readahead_expand(struct readahead_control *rac, pgoff_t start,
		unsigned int count);

It might not actually expand the readahead attempt at all -- for example,
if there's already a page in the page cache, or if it can't allocate
memory.  But this puts the responsibility for allocating pages in the VFS,
where it belongs.

4. Mike wants to be able to do 4MB I/Os [1].  That should be covered by
the solution above.  Mike, just to clarify.  Do you need 4MB pages, or can
you work with some mixture of page sizes going as far as 1024 x 4kB pages?

5. I'm allocating larger pages in the readahead code (part of the THP
patch set [2])

[1] https://lore.kernel.org/linux-fsdevel/CAOg9mSSrJp2dqQTNDgucLoeQcE_E_aYPxnRe5xphhdSPYw7QtQ@xxxxxxxxxxxxxx/
[2] http://git.infradead.org/users/willy/pagecache.git/commitdiff/c00bd4082c7bc32a17b0baa29af6974286978e1f



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux