Oops... my non-work email doesn't default to text only, so this bounced to the list... ---------- Forwarded message --------- From: Mike Marshall <hubcapsc@xxxxxxxxx> Date: Fri, Jan 1, 2021 at 5:15 PM Subject: Re: problem with orangefs readpage... To: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Mike Marshall <hubcap@xxxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx> Hi Matthew... Thanks so much for the suggestions! > This is some new version of orangefs_readpage(), right? No, that code has been upstream for a while... that readahead_control thing looks very interesting :-) ... -Mike On Thu, Dec 31, 2020 at 11:08 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Thu, Dec 31, 2020 at 04:51:53PM -0500, Mike Marshall wrote: > > Greetings... > > > > I hope some of you will suffer through reading this long message :-) ... > > Hi Mike! Happy New Year! > > > Orangefs isn't built to do small IO. Reading a > > big file in page cache sized chunks is slow and painful. > > I tried to write orangefs_readpage so that it would do a reasonable > > sized hard IO, fill the page that was being called for, and then > > go ahead and fill a whole bunch of the following pages into the > > page cache with the extra data in the IO buffer. > > This is some new version of orangefs_readpage(), right? I don't see > anything resembling this in the current codebase. Did you disable > orangefs_readpages() as part of this work? Because the behaviour you're > describing sounds very much like what the readahead code might do to a > filesystem which implements readpage and neither readahead nor readpages. > > > orangefs_readpage gets called for the first four pages and then my > > prefill kicks in and fills the next pages and the right data ends > > up in /tmp/nine. I, of course, wished and planned for orangefs_readpage > > to only get called once, I don't understand why it gets called four > > times, which results in three extraneous expensive hard IOs. > > I might suggest some judicious calling of dump_stack() to understand > exactly what's calling you. My suspicion is that it's this loop in > read_pages(): > > while ((page = readahead_page(rac))) { > aops->readpage(rac->file, page); > put_page(page); > } > > which doesn't test for PageUptodate before calling you. > > It'd probably be best if you implemented ->readahead, which has its own > ideas about which pages would be the right ones to read. It's not always correct, but generally better to have that logic in the VFS than in each filesystem. > > You probably want to have a look at Dave Howells' work to allow > the filesystem to expand the ractl: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter > > specifically this patch: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=fscache-iter&id=f582790b32d5d1d8b937df95a8b2b5fdb8380e46