On Wed, Mar 31, 2021 at 2:04 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Wed, 2021-03-31 at 13:49 -0400, David Wysochanski wrote: > > Trond, > > > > I've been working on getting NFS converted to dhowells new fscache > > and > > netfs APIs and running into a problem with how NFS is designed and it > > involves the NFS pagelist.c / pgio API. I'd appreciate it if you > > could review and give your thoughts on possible approaches. I've > > tried to outline some of the possibilities below. I tried coding > > option #3 and ran into some problems, and it has a serialization > > limitation. At this point I'm leaning towards option 2, so I'll > > probably try that approach if you don't have time for review or have > > strong thoughts on it. > > > > I am not going through another redesign of the NFS code in order to > accommodate another cachefs design. If netfs needs a refactoring or > redesign of the I/O code then it will be immediately NACKed. > I don't think it will require a redesign. I was thinking more about adding a flag to nfs_pageio_add_request() for example that would return a different value if coalesce of the page being added failed. So we'd have: nfs_pageio_init(): called 1 time nfs_pageio_add_request(): called N times, one for each page, but stop if coalesce fails nfs_pageio_complete(): called 1 time > Why does netfs need to know these details about the NFS code anyway? > We can probably get by without it but it will be awkward and probably not the best, but I'm not sure. I tried to explain below with a problem statement but maybe it was unclear. The basic design of netfs API as it pertains to this problem is: * issue_op(): calls into the specific netfs (NFS) to obtain the data from server (send one or more RPCs) * netfs_subreq_terminated(): when RPC(s) are completed, we need to call netfs API back to say the data is either there or there was an error I would note that assuming we can come up with something acceptable to NFS, it should simplify both nfs_readpage() and nfs_readpages/nfs_readhead. I hope we can find some common ground so it's neither too invasive to NFS and also maybe there's some similar improvements in NFS that can be done along with this interface. > > Thanks. > > > > > > Problem: The NFS pageio interface does not expose a max read length > > that > > we can easily use inside netfs clamp_length() function. As a result, > > when > > issue_op() is called indicating a single netfs subrequest, this can > > be > > split into > > multiple NFS subrequests / RPCs inside guts of NFS pageio code. > > Multiple > > NFS subrequests requests leads to multiple completions, and the netfs > > API expects a 1:1 mapping between issue_op() and > > netfs_subreq_terminated() calls. > > > > Details of the NFS pageio API (see include/linux/nfs_page.h and > > fs/nfs/pagelist.c) > > Details of the netfs API (see include/linux/netfs.h and > > fs/netfs/read_helper.c) > > > > The NFS pageio API 3 main calls are as follows: > > 1. nfs_pageio_init(): initialize a pageio structure (R/W IO of N > > pages) > > 2. nfs_pageio_add_request(): called for each page to add to an IO > > * Calls nfs_pageio_add_request_mirror -> __nfs_pageio_add_request > > * __nfs_pageio_add_request may call nfs_pageio_doio() which > > actually > > sends an RPC over the wire if page cannot be added to the request > > ("coalesced") due to various factors. For more details, see > > nfs_pageio_do_add_request() and all underlying code it calls such > > as nfs_coalesce_size() and subsequent pgio->pg_ops->pg_test() > > calls > > 3. nfs_pageio_complete() - "complete" the pageio > > * calls nfs_pageio_complete_mirror -> nfs_pageio_doio() > > > > The NFS pageio API thus may generate multiple over the wire RPCs > > and thus multiple completions even though at the high level only > > one call to nfs_pageio_complete() is made. > > > > Option 1: Just use NFS pageio API as is, and deal with possible > > multiple > > completions. > > - Inconsistent with netfs design intent > > - Need to keep track of the RPC completion status, and for example, > > if one completes with success and one an error, probably call > > netfs_subreq_terminated() with the error. > > - There's no way for the caller of the NFS pageio API to know how > > many RPCs and thus completions may occur. Thus, it's unclear how > > one would distinguish between a READ that resulted in a single RPC > > over the wire that completed as a short read, and a READ that > > resulted in multiple RPCs that would each complete separately, > > but would eventually complete > > > > Option 2: Create a more complex 'clamp_length()' function for NFS, > > taking into account all ways NFS / pNFS code can split a read. > > + Consistent with netfs design intent > > + Multiple "split" requests would be called in parallel (see loop > > inside netfs_readahead, which repeatedly calls > > netfs_rreq_submit_slice) > > - Looks impossible without refactoring of NFS pgio API. We need > > to prevent nfs_pageio_add_request() from calling nfs_pagio_doio(), > > and return some indication coalesce failed. In addition, it may > > run into problems with fallback from DS to MDS for example (see > > commit d9156f9f364897e93bdd98b4ad22138de18f7c24). > > > > Option 3: Utilize NETFS_SREQ_SHORT_READ flag as needed. > > + Consistent with netfs design intent > > - Multiple "split" requests would be serialized (see code > > paths inside netfs_subreq_terminated that check for this flag). > > - Looks impossible without some refactoring of NFS pgio API. > > * Notes: Terminate NFS pageio page based loop at the first call > > to nfs_pageio_doio(). When a READ completes, NFS calls > > netfs_subreq_terminated() with NETFS_SREQ_SHORT_READ > > and is prepared to have the rest of the subrequest be resubmitted. > > Need to somehow fail early or avoid entirely subsequent calls to > > nfs_pagio_doio() for the original request though, and handle > > error status only from the first RPC. > > > > Option 4: Add some final completion routine to be called near > > bottom of nfs_pageio_complete() and would pass in at least > > netfs_read_subrequest(), possibly nfs_pageio_descriptor. > > + Inconsistent with netfs design intent > > - Would be a new NFS API or call on top of everything > > - Need to handle the "multiple completion with different > > status" problem (see #1). > > > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > >