On Wed, 2021-03-31 at 13:49 -0400, David Wysochanski wrote: > Trond, > > I've been working on getting NFS converted to dhowells new fscache > and > netfs APIs and running into a problem with how NFS is designed and it > involves the NFS pagelist.c / pgio API. I'd appreciate it if you > could review and give your thoughts on possible approaches. I've > tried to outline some of the possibilities below. I tried coding > option #3 and ran into some problems, and it has a serialization > limitation. At this point I'm leaning towards option 2, so I'll > probably try that approach if you don't have time for review or have > strong thoughts on it. > I am not going through another redesign of the NFS code in order to accommodate another cachefs design. If netfs needs a refactoring or redesign of the I/O code then it will be immediately NACKed. Why does netfs need to know these details about the NFS code anyway? > Thanks. > > > Problem: The NFS pageio interface does not expose a max read length > that > we can easily use inside netfs clamp_length() function. As a result, > when > issue_op() is called indicating a single netfs subrequest, this can > be > split into > multiple NFS subrequests / RPCs inside guts of NFS pageio code. > Multiple > NFS subrequests requests leads to multiple completions, and the netfs > API expects a 1:1 mapping between issue_op() and > netfs_subreq_terminated() calls. > > Details of the NFS pageio API (see include/linux/nfs_page.h and > fs/nfs/pagelist.c) > Details of the netfs API (see include/linux/netfs.h and > fs/netfs/read_helper.c) > > The NFS pageio API 3 main calls are as follows: > 1. nfs_pageio_init(): initialize a pageio structure (R/W IO of N > pages) > 2. nfs_pageio_add_request(): called for each page to add to an IO > * Calls nfs_pageio_add_request_mirror -> __nfs_pageio_add_request > * __nfs_pageio_add_request may call nfs_pageio_doio() which > actually > sends an RPC over the wire if page cannot be added to the request > ("coalesced") due to various factors. For more details, see > nfs_pageio_do_add_request() and all underlying code it calls such > as nfs_coalesce_size() and subsequent pgio->pg_ops->pg_test() > calls > 3. nfs_pageio_complete() - "complete" the pageio > * calls nfs_pageio_complete_mirror -> nfs_pageio_doio() > > The NFS pageio API thus may generate multiple over the wire RPCs > and thus multiple completions even though at the high level only > one call to nfs_pageio_complete() is made. > > Option 1: Just use NFS pageio API as is, and deal with possible > multiple > completions. > - Inconsistent with netfs design intent > - Need to keep track of the RPC completion status, and for example, > if one completes with success and one an error, probably call > netfs_subreq_terminated() with the error. > - There's no way for the caller of the NFS pageio API to know how > many RPCs and thus completions may occur. Thus, it's unclear how > one would distinguish between a READ that resulted in a single RPC > over the wire that completed as a short read, and a READ that > resulted in multiple RPCs that would each complete separately, > but would eventually complete > > Option 2: Create a more complex 'clamp_length()' function for NFS, > taking into account all ways NFS / pNFS code can split a read. > + Consistent with netfs design intent > + Multiple "split" requests would be called in parallel (see loop > inside netfs_readahead, which repeatedly calls > netfs_rreq_submit_slice) > - Looks impossible without refactoring of NFS pgio API. We need > to prevent nfs_pageio_add_request() from calling nfs_pagio_doio(), > and return some indication coalesce failed. In addition, it may > run into problems with fallback from DS to MDS for example (see > commit d9156f9f364897e93bdd98b4ad22138de18f7c24). > > Option 3: Utilize NETFS_SREQ_SHORT_READ flag as needed. > + Consistent with netfs design intent > - Multiple "split" requests would be serialized (see code > paths inside netfs_subreq_terminated that check for this flag). > - Looks impossible without some refactoring of NFS pgio API. > * Notes: Terminate NFS pageio page based loop at the first call > to nfs_pageio_doio(). When a READ completes, NFS calls > netfs_subreq_terminated() with NETFS_SREQ_SHORT_READ > and is prepared to have the rest of the subrequest be resubmitted. > Need to somehow fail early or avoid entirely subsequent calls to > nfs_pagio_doio() for the original request though, and handle > error status only from the first RPC. > > Option 4: Add some final completion routine to be called near > bottom of nfs_pageio_complete() and would pass in at least > netfs_read_subrequest(), possibly nfs_pageio_descriptor. > + Inconsistent with netfs design intent > - Would be a new NFS API or call on top of everything > - Need to handle the "multiple completion with different > status" problem (see #1). > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx