On 5/23/22 8:40 AM, J. Bruce Fields wrote:
On Mon, May 02, 2022 at 06:38:03PM -0700, dai.ngo@xxxxxxxxxx wrote:
On 5/2/22 6:21 PM, J. Bruce Fields wrote:
On Mon, May 02, 2022 at 09:12:52PM -0400, J. Bruce Fields wrote:
Looks good to me.
And the only new test failures are due to the new DELAYs on OPEN.
Somebody'll need to fix up pynfs. (I'm not volunteering for now.)
I will fix it, since I broke it :-)
By the way, I have three more notes on courtesy server stuff that I
wanted to dump into email before I forget them:
1. I do still recommend fixing up those pynfs failures. The ones I see
are in RENEW3, LKU10, CLOSE9, CLOSE8, but there may be others.
I had the pynfs fix ready, I just wait for the courteous server patches
to go in 5.19 then submit the pynfs fix. Or do you want me to send it
out now?
2. In the lock case, nfsd4_lock() holds an st_mutex while calling
vfs_lock_file(), which may end up needing to wait for the laundromat.
As I said in review, I don't see a potential deadlock there, so I'm
fine with the code going in as is.
But, as a note for possible cleanup, or if this does turn into a
problem later: vfs_lock_file could return to nfsd4_lock(), and
nfsd4_lock() could easily drop the st_mutex, wait, and retry.
I think the only trick part would be deciding on conventions for the
caller to tell vfs_lock_file() that it shouldn't wait in this case
(non-nfsd callers will still want to wait), and for vfs_lock_file()
to indicate the caller needs to retry. Probably something in
fl_flags for the former, and an agreed-on error return for the
latter?
3. One other piece of future work would be optimizing the conflicting
lock case. A very premature optimization at this point, but I'm just
leaving my notes here in case someone's interested:
The loop in posix_lock_inode() is currently O(N^2) in the number of
expirable clients holding conflicting locks, because each time we
encounter one, we wait and then restart. In practice I doubt that
matters--if you have a lot of clients to expire, the time rescanning
the list will likely be trivial compared to the time spent waiting
for nfsdcld to commit the expiry of each client to stable storage.
*However*, it might be a more significant optimization if we first
allowed more parallelism in nfsdcld. And that might also benefit
some other cases (e.g., lots of clients reconnecting after a crash).
We'd need paralle nfsdcld--no idea what that would involve--and I
think it'd also help to update the kernel<->nfsdcld protocol with a
separate commit operation, so that nfsd could issue a bunch of client
changes and then a single commit to wait for them all.
That done, we could modify the loop in vfs_lock_file() so that, in
the case where multiple clients hold conflicting locks, the loop
marks them all for expiry in one pass, then waits just once at the
end.
Thank you for your notes Bruce, I will keep these in mind.
-Dai
--b.