Re: [PATCH RFC v25 0/7] NFSD: Initial implementation of NFSv4 Courteous Server

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Mon, 23 May 2022 11:40:26 -0400

On Mon, May 02, 2022 at 06:38:03PM -0700, dai.ngo@xxxxxxxxxx wrote:
> 
> On 5/2/22 6:21 PM, J. Bruce Fields wrote:
> >On Mon, May 02, 2022 at 09:12:52PM -0400, J. Bruce Fields wrote:
> >>Looks good to me.
> >And the only new test failures are due to the new DELAYs on OPEN.
> >Somebody'll need to fix up pynfs.  (I'm not volunteering for now.)
> 
> I will fix it, since I broke it :-)

By the way, I have three more notes on courtesy server stuff that I
wanted to dump into email before I forget them:

1. I do still recommend fixing up those pynfs failures.  The ones I see
   are in RENEW3, LKU10, CLOSE9, CLOSE8, but there may be others.

2. In the lock case, nfsd4_lock() holds an st_mutex while calling
   vfs_lock_file(), which may end up needing to wait for the laundromat.
   As I said in review, I don't see a potential deadlock there, so I'm
   fine with the code going in as is.

   But, as a note for possible cleanup, or if this does turn into a
   problem later: vfs_lock_file could return to nfsd4_lock(), and
   nfsd4_lock() could easily drop the st_mutex, wait, and retry.

   I think the only trick part would be deciding on conventions for the
   caller to tell vfs_lock_file() that it shouldn't wait in this case
   (non-nfsd callers will still want to wait), and for vfs_lock_file()
   to indicate the caller needs to retry.  Probably something in
   fl_flags for the former, and an agreed-on error return for the
   latter?

3. One other piece of future work would be optimizing the conflicting
   lock case.  A very premature optimization at this point, but I'm just
   leaving my notes here in case someone's interested:

   The loop in posix_lock_inode() is currently O(N^2) in the number of
   expirable clients holding conflicting locks, because each time we
   encounter one, we wait and then restart.  In practice I doubt that
   matters--if you have a lot of clients to expire, the time rescanning
   the list will likely be trivial compared to the time spent waiting
   for nfsdcld to commit the expiry of each client to stable storage.

   *However*, it might be a more significant optimization if we first
   allowed more parallelism in nfsdcld.  And that might also benefit
   some other cases (e.g., lots of clients reconnecting after a crash).
   We'd need paralle nfsdcld--no idea what that would involve--and I
   think it'd also help to update the kernel<->nfsdcld protocol with a
   separate commit operation, so that nfsd could issue a bunch of client
   changes and then a single commit to wait for them all.

   That done, we could modify the loop in vfs_lock_file() so that, in
   the case where multiple clients hold conflicting locks, the loop
   marks them all for expiry in one pass, then waits just once at the
   end.

--b.