Re: /proc/PID/io/read_bytes accounting regression?

David Wysochanski <dwysocha@xxxxxxxxxx> · Fri, 17 Feb 2023 10:54:01 -0500

On Fri, Feb 17, 2023 at 10:47 AM David Wysochanski <dwysocha@xxxxxxxxxx> wrote:
>
> On Fri, Feb 17, 2023 at 9:36 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Feb 17, 2023 at 09:08:27AM -0500, David Wysochanski wrote:
> > > On Fri, Feb 17, 2023 at 6:56 AM Daire Byrne <daire@xxxxxxxx> wrote:
> > > > On newer kernels, it looks like the task io accounting is not
> > > > incrementing the read_bytes when reading from a NFS mount? This was
> > > > definitely working on v5.16 downwards, but has not been working since
> > > > v5.18 up to v6.2 (I haven't tested v5.17 yet).
> > >
> > > In v5.16 we had this call to task_io_account_read(PAGE_SIZE); on line 109
> > > of read_cache_pages();
> > >
> > > But there's no call to task_io_account_read() anymore in the new
> > > readahead code paths that I could tell,
> > > but maybe I'm missing something.
> > >
> > > Willy,
> > > Does each caller of readahead_page() now need to call
> > > task_io_account_read() or should we add that into
> > > readahead_page() or maybe inside read_pages()?
> >
> > I think the best way is to mimic what the block layer does as closely as
> > possible.  Unless we can pull it out of the block layer & all
> > filesystems and put it in the VFS (which we can't; the VFS doesn't
> > know which blocks are recorded by the filesystem as holes and will not
> > result in I/O).
> >
> Caller, ok.  I see, that makes sense.
> Looks like cifs has a call to task_io_account_read() after data has been
> received.  Also netfs has a call as well at the bottom of
> netfs_rreq_unlock_folios().
> Both seems to be _after_ data has been received, but I'm not sure that's
> correct.
>
> > The block layer does it as part of the BIO submission path (and also
> > counts PGPGIN and PGPGOUT, which no network filesystems seem to do?)
> > You're more familiar with the NFS code than I am, so you probably
> > have a better idea than __nfs_pageio_add_request().
> >
> Hmm, yes does the block layer's accounting take into account actual
> bytes read or just submitted?  Maybe I need to look at this closer
> but at first glance it looks like these numbers may sometimes be
> incremented when actual data is received and others are incremented
> when the submission happens.
>
> As to the right location in NFS, the function you mention isn't a bad
> idea, but maybe not the right location.  Looking in nfs_file_direct_read()
> we have the accounting at IO submission time, appears to be the
> same as the block layer.  Also since my NFS netfs conversion patches
> are still outstanding, I'll have to somehow take the netfs call into account
> if fscache is enabled.  So the right place is looking like somewhere
> in nfs_read_folio() and nfs_readahead().

I should have read the kernel docs.  From
Documentation/filesystems/proc.rst

1744 read_bytes
1745 ^^^^^^^^^^
1746
1747 I/O counter: bytes read
1748 Attempt to count the number of bytes which this process really did cause to
1749 be fetched from the storage layer. Done at the submit_bio() level, so it is
1750 accurate for block-backed filesystems. <please add status regarding NFS and
1751 CIFS at a later time>

So it looks like NFS directIO (and non-direct, prior to v5,18) did the
same thing
as the block layer and is consistent with the definition.
Fix would be just add a call to task_io_account_read() inside nfs_read_folio()
and nfs_readahead().