Re: [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs

David Wysochanski <dwysocha@xxxxxxxxxx> · Mon, 30 Nov 2020 08:18:25 -0500

On Sat, Nov 21, 2020 at 1:48 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
>
>
> > On Nov 21, 2020, at 1:28 PM, David Wysochanski <dwysocha@xxxxxxxxxx> wrote:
> >
> > On Sat, Nov 21, 2020 at 12:16 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >>
> >>
> >>
> >>> On Nov 21, 2020, at 12:01 PM, David Wysochanski <dwysocha@xxxxxxxxxx> wrote:
> >>>
> >>> On Sat, Nov 21, 2020 at 11:14 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >>>>
> >>>> Hi Dave-
> >>>>
> >>>>> On Nov 21, 2020, at 8:29 AM, Dave Wysochanski <dwysocha@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> These patches update the NFS client to use the new netfs and fscache
> >>>>> APIs and are at:
> >>>>> https://github.com/DaveWysochanskiRH/kernel.git
> >>>>> https://github.com/DaveWysochanskiRH/kernel/commit/94e9633d98a5542ea384b1095290ac6f915fc917
> >>>>> https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-nfs-20201120
> >>>>>
> >>>>> The patches are based on David Howells fscache-iter tree at
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
> >>>>>
> >>>>> The first 6 patches refactor some of the NFS read code to facilitate
> >>>>> re-use, the next 6 patches do the conversion to the new API, and the
> >>>>> last patch is a somewhat awkward fix for a problem seen in final
> >>>>> testing.
> >>>>>
> >>>>> Per David Howell's recent post, note that the new fscache API is
> >>>>> divided into two separate APIs, a 'netfs' API and an 'fscache' API.
> >>>>> The netfs API was done to help simplify the IO paths of network
> >>>>> filesystems, and can be called even when fscache is disabled, thus
> >>>>> simplifing both readpage and readahead implementations.  However,
> >>>>> for now these NFS conversion patches only call the netfs API when
> >>>>> fscache is enabled, similar to the existing NFS code.
> >>>>>
> >>>>> Trond and Anna, I would appreciate your guidance on this patchset.
> >>>>> At least I would like your feedback regarding the direction
> >>>>> you would like these patches to go, in particular, the following
> >>>>> items:
> >>>>>
> >>>>> 1. Whether you are ok with using the netfs API unconditionally even
> >>>>> when fscache is disabled, or prefer this "least invasive to NFS"
> >>>>> approach.  Note the unconditional use of the netfs API is the
> >>>>> recommended approach per David's post and the AFS and CEPH
> >>>>> implementations, but I was unsure if you would accept this
> >>>>> approach or would prefer to minimize changes to NFS.  Note if
> >>>>> we keep the current approach to minimize NFS changes, we will
> >>>>> have to address some problems with page unlocking such as with
> >>>>> patch 13 in the series.
> >>>>>
> >>>>> 2. Whether to keep the NFS fscache implementation as "read only"
> >>>>> or if we add write through support.  Today we only enable fscache
> >>>>> when a file is open read-only and disable / invalidate when a file
> >>>>> is open for write.
> >>>>>
> >>>>> Still TODO
> >>>>> 1. Address known issues (lockdep, page unlocking), depending on
> >>>>> what is decided as far as implementation direction
> >>>>> a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc()
> >>>>> with GFP_KERNEL which may sleep (dhowells noted this in a review)
> >>>>> b) nfs_refresh_inode() takes inode->i_lock but may call
> >>>>> __fscache_invalidate() which may sleep (found with lockdep)
> >>>>> 2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*)
> >>>>> * Compare with netfs stats and determine if still needed
> >>>>> 3. Cleanup dfprintks and/or convert to tracepoints
> >>>>> 4. Further tests (see "Not tested yet")
> >>>>
> >>>> Can you say whether your approach has any performance impact?
> >>> No I cannot.
> >>>
> >>>> In particular, what comparative benchmarks have been run?
> >>>>
> >>> No comparisons so far, but note the last bullet - "performance".
> >>>
> >>> Are you wondering about performance with/without fscache for this
> >>> series, or the old vs new fscache, or something else?
> >>
> >> I'd like to have some explicit performance-related merge worthiness
> >> criteria. For example: "No performance regressions, and here's how
> >> we're going to determine that we're good: fio / iozone / yada with
> >> NFS/TCP and NFS/RDMA on 100GbE; for very little additional CPU
> >> cost measured via perf xyzzy. Also some benchmark that measures lock
> >> contention."
> >>
> >> We haven't been especially careful about this in the past when
> >> reworking the client's primary I/O paths. Nothing unreasonable, but
> >> it should be stated up front where we want to end up.
> >>
> > Makes sense.
> >
> >> Another approach might be: we're going to start by making fscache
> >> opt-in. As confidence increases over time and good performance is
> >> demonstrated, then we'll unify the fscache and non-cached I/O paths.
> >>
> > It sounds like what you want is what I've done in this first implementation.
>
> My understanding is that you haven't decided whether to take the
> opt-in approach or to convert the primary I/O paths right now.
>

I posted this approach because I thought it would be the least path to
getting the new fscache changes merged.

I have another patch in progress that converts by using netfs API
unconditionally.
However I didn't think it made sense to post that until I received feedback
on this set and approach vs the more invasive approach.

>
> > This implementation takes a "least invasive to NFS" approach, staying
> > with the old fscache on/off logic, even though this was not ideal or what
> > was recommended as the end game for the netfs API.
>
> I'm not taking a position on your two approaches, but I would like
> that when the time comes to take the approach that involves full
> integration, we should have an agreed-upon set of performance
> goals. I don't want that integration to cause our high performance
> environments (NFS/RDMA, for instance) to lose out from that
> integration.
>
> Btw, it isn't clear yet that we need to use the fscache APIs to
> introduce proper huge page support. One of our current efforts is
> to convert xdr_buf from the use of an array of struct page pointers
> to an array of struct bio_vec pointers. In that case each entry in
> the array carries the size of the thing that the entry points to,
> which today is always PAGE_SIZE, but in the future could be much
> larger.
>
>
> >>>>> Checks run
> >>>>> 1. checkpatch: PASS*
> >>>>> * a few warnings, mostly trivial fixups, some unrelated to this set
> >>>>> 2. kernel builds with each patch: PASS
> >>>>> * each patch in series built cleanly which ensure bisection
> >>>>>
> >>>>> Tests run
> >>>>> 1. Custom NFS+fscache unit tests for basic operation: PASS*
> >>>>> * no oops or data corruptions
> >>>>> * Some op counts are a bit off but these are mostly due
> >>>>>  to statistics not implemented properly (NFSIOS_FSCACHE_*)
> >>>>> 2. cthon04: PASS (test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys)
> >>>>> * No failures or oopses for any version or test options
> >>>>> 3. iozone tests (fsc,vers=3,4.0,4.1,4.2,sec=sys): PASS
> >>>>> * No failures or oopses
> >>>>> 4. kernel build (fsc,vers=3,4.1,4.2): PASS*
> >>>>> * all builds finish without errors or data corruption
> >>>>> * one lockdep "scheduling while atomic" fired with NFS41 and
> >>>>>  was due to one an fscache invalidation code path (known issue 'b' above)
> >>>>> 5. xfstests/generic (fsc,vers=4.2, nofsc,vers=4.2): PASS*
> >>>>> * generic/013 (pass but triggers i_lock lockdep warning known issue 'a' above)
> >>>>> * NOTE: The following tests failed with errors, but they
> >>>>>   also fail on vanilla 5.10-rc4 so are not related to this
> >>>>>   patchset
> >>>>>   * generic/074 (lockep invalid wait context - nfs_free_request())
> >>>>>   * generic/538 (short read)
> >>>>>   * generic/551 (pread: Unknown error 524, Data verification fail)
> >>>>>   * generic/568 (ERROR: File grew from 4096 B to 8192 B when writing to the fallocated range)
> >>>>>
> >>>>> Not tested yet:
> >>>>> * error injections (for example, connection disruptions, server errors during IO, etc)
> >>>>> * pNFS
> >>>>> * many process mixed read/write on same file
> >>>>> * performance
> >>>>> Dave Wysochanski (13):
> >>>>> NFS: Clean up nfs_readpage() and nfs_readpages()
> >>>>> NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
> >>>>>  succeeds
> >>>>> NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
> >>>>>  nfs_readdesc
> >>>>> NFS: Call readpage_async_filler() from nfs_readpage_async()
> >>>>> NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
> >>>>> NFS: Allow internal use of read structs and functions
> >>>>> NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
> >>>>> NFS: Convert fscache_enable_cookie and fscache_disable_cookie
> >>>>> NFS: Convert fscache invalidation and update aux_data and i_size
> >>>>> NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
> >>>>> NFS: Convert readpage to readahead and use netfs_readahead for fscache
> >>>>> NFS: Allow NFS use of new fscache API in build
> >>>>> NFS: Ensure proper page unlocking when reads fail with retryable
> >>>>>  errors
> >>>>>
> >>>>> fs/nfs/Kconfig             |   2 +-
> >>>>> fs/nfs/direct.c            |   2 +
> >>>>> fs/nfs/file.c              |  22 ++--
> >>>>> fs/nfs/fscache-index.c     |  94 --------------
> >>>>> fs/nfs/fscache.c           | 315 ++++++++++++++++++++-------------------------
> >>>>> fs/nfs/fscache.h           | 132 +++++++------------
> >>>>> fs/nfs/inode.c             |   4 +-
> >>>>> fs/nfs/internal.h          |   8 ++
> >>>>> fs/nfs/nfs4proc.c          |   2 +-
> >>>>> fs/nfs/pagelist.c          |   2 +
> >>>>> fs/nfs/read.c              | 248 ++++++++++++++++-------------------
> >>>>> fs/nfs/write.c             |   3 +-
> >>>>> include/linux/nfs_fs.h     |   5 +-
> >>>>> include/linux/nfs_iostat.h |   2 +-
> >>>>> include/linux/nfs_page.h   |   1 +
> >>>>> include/linux/nfs_xdr.h    |   1 +
> >>>>> 16 files changed, 339 insertions(+), 504 deletions(-)
> >>>>>
> >>>>> --
> >>>>> 1.8.3.1
> >>>>>
> >>>>
> >>>> --
> >>>> Chuck Lever
> >>
> >> --
> >> Chuck Lever
>
> --
> Chuck Lever
>
>
>