On Fri, Jan 01, 2021 at 05:15:32PM -0500, Mike Marshall wrote: > Hi Matthew... Thanks so much for the suggestions! > > This is some new version of orangefs_readpage(), right? > No, that code has been upstream for a while... that readahead_control > thing looks very interesting :-) ... Oh, my, I was looking at a tree from before 2018 that still had orangefs_readpages. So, yes, I think what's happening is that orangefs_readpage() is being called from the readahead code. You'll hit this path: next_page = find_get_page(inode->i_mapping, index); if (next_page) { gossip_debug(GOSSIP_FILE_DEBUG, "%s: found next page, quitting\n", __func__); put_page(next_page); goto out; because readahead already allocated those pages for you and is trying to fill them one-at-a-time. Implementing ->readahead, even without dhowells' patch to expand the ractl will definitely help you! > -Mike > > On Thu, Dec 31, 2020 at 11:08 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > On Thu, Dec 31, 2020 at 04:51:53PM -0500, Mike Marshall wrote: > > > Greetings... > > > > > > I hope some of you will suffer through reading this long message :-) ... > > > > Hi Mike! Happy New Year! > > > > > Orangefs isn't built to do small IO. Reading a > > > big file in page cache sized chunks is slow and painful. > > > I tried to write orangefs_readpage so that it would do a reasonable > > > sized hard IO, fill the page that was being called for, and then > > > go ahead and fill a whole bunch of the following pages into the > > > page cache with the extra data in the IO buffer. > > > > This is some new version of orangefs_readpage(), right? I don't see > > anything resembling this in the current codebase. Did you disable > > orangefs_readpages() as part of this work? Because the behaviour you're > > describing sounds very much like what the readahead code might do to a > > filesystem which implements readpage and neither readahead nor readpages. > > > > > orangefs_readpage gets called for the first four pages and then my > > > prefill kicks in and fills the next pages and the right data ends > > > up in /tmp/nine. I, of course, wished and planned for orangefs_readpage > > > to only get called once, I don't understand why it gets called four > > > times, which results in three extraneous expensive hard IOs. > > > > I might suggest some judicious calling of dump_stack() to understand > > exactly what's calling you. My suspicion is that it's this loop in > > read_pages(): > > > > while ((page = readahead_page(rac))) { > > aops->readpage(rac->file, page); > > put_page(page); > > } > > > > which doesn't test for PageUptodate before calling you. > > > > It'd probably be best if you implemented ->readahead, which has its own > > ideas about which pages would be the right ones to read. It's not always > > correct, but generally better to have that logic in the VFS than in each > > filesystem. > > > > You probably want to have a look at Dave Howells' work to allow > > the filesystem to expand the ractl: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter > > > > specifically this patch: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=fscache-iter&id=f582790b32d5d1d8b937df95a8b2b5fdb8380e46 > >