RFC: Approaches to resolve netfs API interface to NFS multiple completions problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Trond,

I've been working on getting NFS converted to dhowells new fscache and
netfs APIs and running into a problem with how NFS is designed and it
involves the NFS pagelist.c / pgio API.  I'd appreciate it if you
could review and give your thoughts on possible approaches.  I've
tried to outline some of the possibilities below.  I tried coding
option #3 and ran into some problems, and it has a serialization
limitation.  At this point I'm leaning towards option 2, so I'll
probably try that approach if you don't have time for review or have
strong thoughts on it.

Thanks.


Problem: The NFS pageio interface does not expose a max read length that
we can easily use inside netfs clamp_length() function.  As a result, when
issue_op() is called indicating a single netfs subrequest, this can be
split into
multiple NFS subrequests / RPCs inside guts of NFS pageio code.  Multiple
NFS subrequests requests leads to multiple completions, and the netfs
API expects a 1:1 mapping between issue_op() and
netfs_subreq_terminated() calls.

Details of the NFS pageio API (see include/linux/nfs_page.h and
fs/nfs/pagelist.c)
Details of the netfs API (see include/linux/netfs.h and fs/netfs/read_helper.c)

The NFS pageio API 3 main calls are as follows:
1. nfs_pageio_init(): initialize a pageio structure (R/W IO of N pages)
2. nfs_pageio_add_request(): called for each page to add to an IO
* Calls nfs_pageio_add_request_mirror -> __nfs_pageio_add_request
  * __nfs_pageio_add_request may call nfs_pageio_doio() which actually
    sends an RPC over the wire if page cannot be added to the request
    ("coalesced") due to various factors.  For more details, see
    nfs_pageio_do_add_request() and all underlying code it calls such
    as nfs_coalesce_size() and subsequent pgio->pg_ops->pg_test() calls
3. nfs_pageio_complete() - "complete" the pageio
* calls nfs_pageio_complete_mirror -> nfs_pageio_doio()

The NFS pageio API thus may generate multiple over the wire RPCs
and thus multiple completions even though at the high level only
one call to nfs_pageio_complete() is made.

Option 1: Just use NFS pageio API as is, and deal with possible multiple
completions.
- Inconsistent with netfs design intent
- Need to keep track of the RPC completion status, and for example,
if one completes with success and one an error, probably call
netfs_subreq_terminated() with the error.
- There's no way for the caller of the NFS pageio API to know how
many RPCs and thus completions may occur.  Thus, it's unclear how
one would distinguish between a READ that resulted in a single RPC
over the wire that completed as a short read, and a READ that
resulted in multiple RPCs that would each complete separately,
but would eventually complete

Option 2: Create a more complex 'clamp_length()' function for NFS,
taking into account all ways NFS / pNFS code can split a read.
+ Consistent with netfs design intent
+ Multiple "split" requests would be called in parallel (see loop
inside netfs_readahead, which repeatedly calls netfs_rreq_submit_slice)
- Looks impossible without refactoring of NFS pgio API.  We need
to prevent nfs_pageio_add_request() from calling nfs_pagio_doio(),
and return some indication coalesce failed.  In addition, it may
run into problems with fallback from DS to MDS for example (see
commit d9156f9f364897e93bdd98b4ad22138de18f7c24).

Option 3: Utilize NETFS_SREQ_SHORT_READ flag as needed.
+ Consistent with netfs design intent
- Multiple "split" requests would be serialized (see code
paths inside netfs_subreq_terminated that check for this flag).
- Looks impossible without some refactoring of NFS pgio API.
* Notes: Terminate NFS pageio page based loop at the first call
to nfs_pageio_doio().  When a READ completes, NFS calls
netfs_subreq_terminated() with NETFS_SREQ_SHORT_READ
and is prepared to have the rest of the subrequest be resubmitted.
Need to somehow fail early or avoid entirely subsequent calls to
nfs_pagio_doio() for the original request though, and handle
error status only from the first RPC.

Option 4: Add some final completion routine to be called near
bottom of nfs_pageio_complete() and would pass in at least
netfs_read_subrequest(), possibly nfs_pageio_descriptor.
+ Inconsistent with netfs design intent
- Would be a new NFS API or call on top of everything
- Need to handle the "multiple completion with different
status" problem (see #1).




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux