Re: Regression in NFS probably due to very large amounts of readahead

Anders Blomdell <anders.blomdell@xxxxxxxxx> · Wed, 27 Nov 2024 08:55:12 +0100

On 2024-11-26 19:42, Matthew Wilcox wrote:
On Tue, Nov 26, 2024 at 06:26:13PM +0100, Anders Blomdell wrote:
On 2024-11-26 17:55, Matthew Wilcox wrote:
On Tue, Nov 26, 2024 at 04:28:04PM +0100, Anders Blomdell wrote:
On 2024-11-26 16:06, Jan Kara wrote:
Hum, checking the history the update of ra->size has been added by Neil two
years ago in 9fd472af84ab ("mm: improve cleanup when ->readpages doesn't
process all pages"). Neil, the changelog seems as there was some real
motivation behind updating of ra->size in read_pages(). What was it? Now I
somewhat disagree with reducing ra->size in read_pages() because it seems
like a wrong place to do that and if we do need something like that,
readahead window sizing logic should rather be changed to take that into
account? But it all depends on what was the real rationale behind reducing
ra->size in read_pages()...

My (rather limited) understanding of the patch is that it was intended to read those pages
that didn't get read because the allocation of a bigger folio failed, while not redoing what
readpages already did; how it was actually going to accomplish that is still unclear to me,
but I even don't even quite understand the comment...

	/*
	 * If there were already pages in the page cache, then we may have
	 * left some gaps.  Let the regular readahead code take care of this
	 * situation.
	 */

the reason for an unchanged async_size is also beyond my understanding.

This isn't because we couldn't allocate a folio, this is when we
allocated folios, tried to read them and we failed to submit the I/O.
This is a pretty rare occurrence under normal conditions.

I beg to differ, the code is reached when there is
no folio support or ra->size < 4 (not considered in
this discussion) or falling throug when !err, err
is set by:

         err = ra_alloc_folio(ractl, index, mark, order, gfp);
                 if (err)
                         break;

isn't the reading done by:

         read_pages(ractl);

which does not set err!

You're misunderstanding.  Yes, read_pages() is called when we fail to
allocate a fresh folio; either because there's already one in the
page cache, or because -ENOMEM (or if we raced to install one), but
it's also called when all folios are normally allocated.  Here:

         /*
          * Now start the IO.  We ignore I/O errors - if the folio is not
          * uptodate then the caller will launch read_folio again, and
          * will then handle the error.
          */
         read_pages(ractl);

So at the point that read_pages() is called, all folios that ractl
describes are present in the page cache, locked and !uptodate.

After calling aops->readahead() in read_pages(), most filesystems will
have consumed all folios described by ractl.  It seems that NFS is
choosing not to submit some folios, so rather than leave them sitting
around in the page cache, Neil decided that we should remove them from
the page cache.
More like me not reading the comments properly, sorry. What I thought I
said, was that the problematic code in the call to do_page_cache_ra was
reached when the folio alloction returned an error. Sorry for not being
clear, and thanks for your patience.

/Anders