On Mon, Feb 1, 2021 at 9:30 AM Anna Schumaker <anna.schumaker@xxxxxxxxxx> wrote: > > Hi David, > > On Sun, Jan 31, 2021 at 9:20 PM David Wysochanski <dwysocha@xxxxxxxxxx> wrote: > > > > On Thu, Jan 28, 2021 at 9:59 AM Dave Wysochanski <dwysocha@xxxxxxxxxx> wrote: > > > > > > This minimal set of patches update the NFS client to use the new > > > readahead method, and convert the NFS fscache to use the new netfs > > > IO API, and are at: > > > https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-lib-nfs-20210128 > > > https://github.com/DaveWysochanskiRH/kernel/commit/74357eb291c9c292f3ab3bc9ed1227cb76f52c51 > > > > > > The patches are based on David Howells fscache-netfs-lib tree at > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-netfs-lib > > > > > > The first 6 patches refactor some of the NFS read code to facilitate > > > re-use, the next 4 patches do the conversion to the new API. Note > > > patch 8 converts nfs_readpages to nfs_readahead. > > > > > > Changes since my last posting on Jan 27, 2021 > > > - Fix oops with fscache enabled on parallel read unit test > > > - Add patches to handle invalidate and releasepage > > > - Use #define FSCACHE_USE_NEW_IO_API to select the new API > > > - Minor cleanup in nfs_readahead_from_fscache > > > > > > Still TODO > > > 1. Fix known bugs > > > a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc() > > > with GFP_KERNEL which may sleep (dhowells noted this in a review) > > > b) nfs_refresh_inode() takes inode->i_lock but may call > > > __fscache_invalidate() which may sleep (found with lockdep) > > > c) WARN with xfstest fscache/netapp/pnfs/nfs41 > > > > Turns out this is a bit more involved and I would not consider pNFS + > > fscache stable right now. > > For now I may have to disable fscache if pNFS is enabled unless I can > > quickly come up > > with a reasonable fix for the problem. > > So my thought right now is to take the first 6 cleanup / preparation > patches for the 5.12 merge window and save the cutover for 5.13. This > would give you an extra release cycle to fix the pNFS stability, and > it would give more time to find and fix any issues in netfs before > switching NFS over to it. > > Would that work? > Anna > Yes that's fine. > > > > The problem is as follows. Once netfs calls us in "issue_op" for a > > given subrequest, it expects > > one call back when the subrequest completes. Now the "clamp_length" > > function was developed > > so we tell the netfs caller how big of an IO we can handle. However, > > right now it only implements > > an 'rsize' check, and it does not take into account pNFS > > characteristics such as segments > > which may split up the IO into multiple RPCs. Since each of the RPC > > have their own > > completion, and so far I've not come up with a way to just call back > > into netfs when the > > last one is done, I am not sure what the right approach is. One > > obvious approach would be > > a more sophisticated "clamp_length" function which adds similar logic > > as to the *pg_test() > > functions. But I don't want to duplicate that and so it's not really clear. > > > > > 2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*) > > > * Compare with netfs stats and determine if still needed > > > 3. Cleanup dfprintks and/or convert to tracepoints > > > 4. Further tests (see "Not tested yet") > > > > > > Tests run > > > 1. Custom NFS+fscache unit tests for basic operation: PASS > > > * vers=3,4.0,4.1,4.2,sec=sys,server=localhost (same kernel) > > > 2. cthon04: PASS > > > * test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys > > > * No failures, oopses or hangs > > > 3. iozone tests: PASS > > > * nofsc,fsc,vers=3,4.0,4.1,4.2,sec=sys,server=rhel7,rhel8 > > > * No failures, oopses, or hangs > > > 4. xfstests/generic: PASS* > > > * no hangs or crashes (one WARN); failures unrelated to these patches > > > * Ran following configurations > > > * vers=4.1,fsc,sec=sys,rhel7-server: PASS > > > * vers=4.0,fsc,sec=sys,rhel7-server: PASS > > > * vers=3,fsc,sec=sys,rhel7-server: PASS > > > * vers=4.1,nofsc,sec=sys,netapp-server(pnfs/files): PASS > > > * vers=4.1,fsc,sec=sys,netapp-server(pnfs/files): INCOMPLETE > > > * WARN_ON fs/netfs/read_helper.c:616 > > > * ran with kernel.panic_on_oops=1 > > > * vers=4.2,fsc,sec=sys,rhel7-server: running at generic/438 > > > * vers=4.2,fsc,sec=sys,rhel8-server: running at generic/127 > > > 5. kernel build: PASS > > > * vers=4.2,fsc,sec=sys,rhel8-server: PASS > > > > > > Not tested yet: > > > * error injections (for example, connection disruptions, server errors during IO, etc) > > > * many process mixed read/write on same file > > > * performance > > > > > > Dave Wysochanski (10): > > > NFS: Clean up nfs_readpage() and nfs_readpages() > > > NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read > > > succeeds > > > NFS: Refactor nfs_readpage() and nfs_readpage_async() to use > > > nfs_readdesc > > > NFS: Call readpage_async_filler() from nfs_readpage_async() > > > NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async() > > > NFS: Allow internal use of read structs and functions > > > NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage > > > NFS: Convert readpages to readahead and use netfs_readahead for > > > fscache > > > NFS: Update releasepage to handle new fscache kiocb IO API > > > NFS: update various invalidation code paths for new IO API > > > > > > fs/nfs/file.c | 22 +++-- > > > fs/nfs/fscache.c | 230 +++++++++++++++++++------------------------ > > > fs/nfs/fscache.h | 105 +++----------------- > > > fs/nfs/internal.h | 8 ++ > > > fs/nfs/pagelist.c | 2 + > > > fs/nfs/read.c | 240 ++++++++++++++++++++------------------------- > > > fs/nfs/write.c | 10 +- > > > include/linux/nfs_fs.h | 5 +- > > > include/linux/nfs_iostat.h | 2 +- > > > include/linux/nfs_page.h | 1 + > > > include/linux/nfs_xdr.h | 1 + > > > 11 files changed, 257 insertions(+), 369 deletions(-) > > > > > > -- > > > 1.8.3.1 > > > > > >