Re: [PATCH RFC v25 0/7] NFSD: Initial implementation of NFSv4 Courteous Server

dai.ngo@xxxxxxxxxx · Mon, 23 May 2022 09:57:56 -0700

On 5/23/22 8:40 AM, J. Bruce Fields wrote:
On Mon, May 02, 2022 at 06:38:03PM -0700, dai.ngo@xxxxxxxxxx wrote:
On 5/2/22 6:21 PM, J. Bruce Fields wrote:
On Mon, May 02, 2022 at 09:12:52PM -0400, J. Bruce Fields wrote:
Looks good to me.
And the only new test failures are due to the new DELAYs on OPEN.
Somebody'll need to fix up pynfs.  (I'm not volunteering for now.)
I will fix it, since I broke it :-)
By the way, I have three more notes on courtesy server stuff that I
wanted to dump into email before I forget them:

1. I do still recommend fixing up those pynfs failures.  The ones I see
    are in RENEW3, LKU10, CLOSE9, CLOSE8, but there may be others.

I had the pynfs fix ready, I just wait for the courteous server patches
to go in 5.19 then submit the pynfs fix. Or do you want me to send it
out now?

2. In the lock case, nfsd4_lock() holds an st_mutex while calling
    vfs_lock_file(), which may end up needing to wait for the laundromat.
    As I said in review, I don't see a potential deadlock there, so I'm
    fine with the code going in as is.

    But, as a note for possible cleanup, or if this does turn into a
    problem later: vfs_lock_file could return to nfsd4_lock(), and
    nfsd4_lock() could easily drop the st_mutex, wait, and retry.

    I think the only trick part would be deciding on conventions for the
    caller to tell vfs_lock_file() that it shouldn't wait in this case
    (non-nfsd callers will still want to wait), and for vfs_lock_file()
    to indicate the caller needs to retry.  Probably something in
    fl_flags for the former, and an agreed-on error return for the
    latter?

3. One other piece of future work would be optimizing the conflicting
    lock case.  A very premature optimization at this point, but I'm just
    leaving my notes here in case someone's interested:

    The loop in posix_lock_inode() is currently O(N^2) in the number of
    expirable clients holding conflicting locks, because each time we
    encounter one, we wait and then restart.  In practice I doubt that
    matters--if you have a lot of clients to expire, the time rescanning
    the list will likely be trivial compared to the time spent waiting
    for nfsdcld to commit the expiry of each client to stable storage.

    *However*, it might be a more significant optimization if we first
    allowed more parallelism in nfsdcld.  And that might also benefit
    some other cases (e.g., lots of clients reconnecting after a crash).
    We'd need paralle nfsdcld--no idea what that would involve--and I
    think it'd also help to update the kernel<->nfsdcld protocol with a
    separate commit operation, so that nfsd could issue a bunch of client
    changes and then a single commit to wait for them all.

    That done, we could modify the loop in vfs_lock_file() so that, in
    the case where multiple clients hold conflicting locks, the loop
    marks them all for expiry in one pass, then waits just once at the
    end.

Thank you for your notes Bruce, I will keep these in mind.

-Dai

--b.