Trond, I've been working on getting NFS converted to dhowells new fscache and netfs APIs and running into a problem with how NFS is designed and it involves the NFS pagelist.c / pgio API. I'd appreciate it if you could review and give your thoughts on possible approaches. I've tried to outline some of the possibilities below. I tried coding option #3 and ran into some problems, and it has a serialization limitation. At this point I'm leaning towards option 2, so I'll probably try that approach if you don't have time for review or have strong thoughts on it. Thanks. Problem: The NFS pageio interface does not expose a max read length that we can easily use inside netfs clamp_length() function. As a result, when issue_op() is called indicating a single netfs subrequest, this can be split into multiple NFS subrequests / RPCs inside guts of NFS pageio code. Multiple NFS subrequests requests leads to multiple completions, and the netfs API expects a 1:1 mapping between issue_op() and netfs_subreq_terminated() calls. Details of the NFS pageio API (see include/linux/nfs_page.h and fs/nfs/pagelist.c) Details of the netfs API (see include/linux/netfs.h and fs/netfs/read_helper.c) The NFS pageio API 3 main calls are as follows: 1. nfs_pageio_init(): initialize a pageio structure (R/W IO of N pages) 2. nfs_pageio_add_request(): called for each page to add to an IO * Calls nfs_pageio_add_request_mirror -> __nfs_pageio_add_request * __nfs_pageio_add_request may call nfs_pageio_doio() which actually sends an RPC over the wire if page cannot be added to the request ("coalesced") due to various factors. For more details, see nfs_pageio_do_add_request() and all underlying code it calls such as nfs_coalesce_size() and subsequent pgio->pg_ops->pg_test() calls 3. nfs_pageio_complete() - "complete" the pageio * calls nfs_pageio_complete_mirror -> nfs_pageio_doio() The NFS pageio API thus may generate multiple over the wire RPCs and thus multiple completions even though at the high level only one call to nfs_pageio_complete() is made. Option 1: Just use NFS pageio API as is, and deal with possible multiple completions. - Inconsistent with netfs design intent - Need to keep track of the RPC completion status, and for example, if one completes with success and one an error, probably call netfs_subreq_terminated() with the error. - There's no way for the caller of the NFS pageio API to know how many RPCs and thus completions may occur. Thus, it's unclear how one would distinguish between a READ that resulted in a single RPC over the wire that completed as a short read, and a READ that resulted in multiple RPCs that would each complete separately, but would eventually complete Option 2: Create a more complex 'clamp_length()' function for NFS, taking into account all ways NFS / pNFS code can split a read. + Consistent with netfs design intent + Multiple "split" requests would be called in parallel (see loop inside netfs_readahead, which repeatedly calls netfs_rreq_submit_slice) - Looks impossible without refactoring of NFS pgio API. We need to prevent nfs_pageio_add_request() from calling nfs_pagio_doio(), and return some indication coalesce failed. In addition, it may run into problems with fallback from DS to MDS for example (see commit d9156f9f364897e93bdd98b4ad22138de18f7c24). Option 3: Utilize NETFS_SREQ_SHORT_READ flag as needed. + Consistent with netfs design intent - Multiple "split" requests would be serialized (see code paths inside netfs_subreq_terminated that check for this flag). - Looks impossible without some refactoring of NFS pgio API. * Notes: Terminate NFS pageio page based loop at the first call to nfs_pageio_doio(). When a READ completes, NFS calls netfs_subreq_terminated() with NETFS_SREQ_SHORT_READ and is prepared to have the rest of the subrequest be resubmitted. Need to somehow fail early or avoid entirely subsequent calls to nfs_pagio_doio() for the original request though, and handle error status only from the first RPC. Option 4: Add some final completion routine to be called near bottom of nfs_pageio_complete() and would pass in at least netfs_read_subrequest(), possibly nfs_pageio_descriptor. + Inconsistent with netfs design intent - Would be a new NFS API or call on top of everything - Need to handle the "multiple completion with different status" problem (see #1).