A few weeks ago Matthew Wilcox helped me see how mm/readahead.c/read_pages was dropping down into some code designed to take over for filesystems that didn't implement ->readahead, and how this "failover" code was screwing over the readahead-like code I'd put into orangefs_readpage. I studied a bunch of readahead code in fs and mm and other filesystems and came up with this patch that seems to work in the tests I've done so far. Sometimes code I like is instead irritating to Linus or Al Viro :-), so hopefully some of y'all will look over what I've got here. There's a couple of printk's I've left in orangefs_readpage that don't belong upstream, they help me now for testing though... Besides the diff at the end of this message, the code is in: https://github.com/hubcapsc/linux/tree/readahead I wish I knew how to specify _nr_pages in the readahead_control structure so that all the extra pages I need could be obtained in readahead_page instead of part there and the rest in my open-coded stuff in orangefs_readpage. But it looks to me as if values in the readahead_control structure are set heuristically outside of my control over in ondemand_readahead? [root@vm3 linux]# git diff master..readahead diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c index 48f0547d4850..682a968cb82a 100644 --- a/fs/orangefs/inode.c +++ b/fs/orangefs/inode.c @@ -244,6 +244,25 @@ static int orangefs_writepages(struct address_space *mapping, static int orangefs_launder_page(struct page *); +/* + * Prefill the page cache with some pages that we're probably + * about to need... + */ +static void orangefs_readahead(struct readahead_control *rac) +{ + pgoff_t index = readahead_index(rac); + struct page *page; + + while ((page = readahead_page(rac))) { + prefetchw(&page->flags); + put_page(page); + unlock_page(page); + index++; + } + + return; +} + static int orangefs_readpage(struct file *file, struct page *page) { struct inode *inode = page->mapping->host; @@ -260,11 +279,16 @@ static int orangefs_readpage(struct file *file, struct page *page) int remaining; /* - * Get up to this many bytes from Orangefs at a time and try - * to fill them into the page cache at once. Tests with dd made - * this seem like a reasonable static number, if there was - * interest perhaps this number could be made setable through - * sysfs... + * Orangefs isn't a good fit for reading files one page at + * a time. Get up to "read_size" bytes from Orangefs at a time and + * try to fill them into the page cache at once. Readahead code in + * mm already got us some extra pages by calling orangefs_readahead, + * but it doesn't know how many we actually wanted, so we'll + * get some more after we use up the extra ones we got from + * orangefs_readahead. Tests with dd made "read_size" seem + * like a reasonable static number of bytes to get from orangefs, + * if there was interest perhaps "read_size" could be made + * setable through sysfs or something... */ read_size = 524288; @@ -302,31 +326,19 @@ static int orangefs_readpage(struct file *file, struct page *page) slot_index = 0; while ((remaining - PAGE_SIZE) >= PAGE_SIZE) { remaining -= PAGE_SIZE; - /* - * It is an optimization to try and fill more than one - * page... by now we've already gotten the single - * page we were after, if stuff doesn't seem to - * be going our way at this point just return - * and hope for the best. - * - * If we look for pages and they're already there is - * one reason to give up, and if they're not there - * and we can't create them is another reason. - */ index++; slot_index++; - next_page = find_get_page(inode->i_mapping, index); + next_page = find_lock_page(inode->i_mapping, index); if (next_page) { - gossip_debug(GOSSIP_FILE_DEBUG, - "%s: found next page, quitting\n", - __func__); - put_page(next_page); - goto out; + printk("%s: found readahead page\n", __func__); + } else { + next_page = + find_or_create_page(inode->i_mapping, + index, + GFP_KERNEL); + printk("%s: alloced my own page\n", __func__); } - next_page = find_or_create_page(inode->i_mapping, - index, - GFP_KERNEL); /* * I've never hit this, leave it as a printk for * now so it will be obvious. @@ -659,6 +671,7 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb, /** ORANGEFS2 implementation of address space operations */ static const struct address_space_operations orangefs_address_operations = { .writepage = orangefs_writepage, + .readahead = orangefs_readahead, .readpage = orangefs_readpage, .writepages = orangefs_writepages, .set_page_dirty = __set_page_dirty_nobuffers, diff --git a/fs/orangefs/orangefs-mod.c b/fs/orangefs/orangefs-mod.c index 74a3d6337ef4..cd7297815f91 100644 --- a/fs/orangefs/orangefs-mod.c +++ b/fs/orangefs/orangefs-mod.c @@ -31,7 +31,7 @@ static ulong module_parm_debug_mask; __u64 orangefs_gossip_debug_mask; int op_timeout_secs = ORANGEFS_DEFAULT_OP_TIMEOUT_SECS; int slot_timeout_secs = ORANGEFS_DEFAULT_SLOT_TIMEOUT_SECS; -int orangefs_cache_timeout_msecs = 50; +int orangefs_cache_timeout_msecs = 500; int orangefs_dcache_timeout_msecs = 50; int orangefs_getattr_timeout_msecs = 50;