On 4/1/24 10:49 AM, Chuck Lever wrote:
On Mon, Apr 01, 2024 at 09:46:25AM -0700, Dai Ngo wrote:
On 4/1/24 9:00 AM, Dai Ngo wrote:
On 4/1/24 6:34 AM, Chuck Lever wrote:
On Mon, Apr 01, 2024 at 08:49:49AM -0400, Jeff Layton wrote:
On Sat, 2024-03-30 at 16:30 -0700, Dai Ngo wrote:
On 3/30/24 11:28 AM, Chuck Lever wrote:
On Sat, Mar 30, 2024 at 10:46:08AM -0700, Dai Ngo wrote:
On 3/29/24 4:42 PM, Chuck Lever wrote:
On Fri, Mar 29, 2024 at 10:57:22AM -0700, Dai Ngo wrote:
On 3/29/24 7:55 AM, Chuck Lever wrote:
It could be straightforward, however, to move the callback_wq into
struct nfs4_client so that each client can have its own workqueue.
Then we can take some time and design something less brittle and
more scalable (and maybe come up with some test infrastructure so
this stuff doesn't break as often).
IMHO I don't see why the callback workqueue has to be different
from the laundry_wq or nfsd_filecache_wq used by nfsd.
You mean, you don't see why callback_wq has to be ordered, while
the others are not so constrained? Quite possibly it does not have
to be ordered.
Yes, I looked at the all the nfsd4_callback_ops on nfsd and they
seems to take into account of concurrency and use locks appropriately.
For each type of work I don't see why one work has to wait for
the previous work to complete before proceed.
But we might have lost the bit of history that explains why, so
let's be cautious about making broad changes here until we have a
good operational understanding of the code and some robust test
cases to check any changes we make.
Understand, you make the call.
commit 88382036674770173128417e4c09e9e549f82d54
Author: J. Bruce Fields <bfields@xxxxxxxxxxxx>
Date: Mon Nov 14 11:13:43 2016 -0500
nfsd: update workqueue creation
No real change in functionality, but the old interface
seems to be
deprecated.
We don't actually care about ordering necessarily, but
we do depend on
running at most one work item at a time: nfsd4_process_cb_update()
assumes that no other thread is running it, and that no new
callbacks
are starting while it's running.
Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
...so it may be as simple as just fixing up nfsd4_process_cb_update().
Allowing parallel recalls would certainly be a good thing.
Thank you Jeff for pointing this out.
Thanks for the research. I was about to do the same.
I think we do allow parallel recalls -- IIUC, callback_wq
single-threads only the dispatch of RPC calls, not their
processing. Note the use of rpc_call_async().
So nfsd4_process_cb_update() is protecting modifications of
cl_cb_client and the backchannel transport. We might wrap that in
a mutex, for example. But I don't see strong evidence (yet) that
this design is a bottleneck when it is working properly.
However, if for some reason, a work item sleeps, that would
block forward progress of the work queue, and would be Bad (tm).
That said, a workqueue per client would be a great place to start. I
don't see any reason to serialize callbacks to different clients.
I volunteer to take care of that for v6.10.
Since you're going to make callback workqueue per client, do we still need
a fix in nfsd to shut down the callback when a client is about to enter
courtesy state and there is pending RPC calls.
I would rather just close down the transports for courtesy clients.
But that doesn't seem to be the root cause, so let's put this aside
for a bit.
With callback workqueue per client, it fixes the problem of all callbacks
hang when a job get stuck in the workqueue. The fix in nfsd prevents a
stuck job to loop until the client reconnects which might be a very long
time or never if that client is no longer used.
The question I have is will this unresponsive client cause other
issues, such as:
- a hang when the server tries to unexport
exportfs -u does not hang, but the share can not be un-exported.
or shutdown
shutdown does not hang since __destroy_client is called which calls
nfsd4_shutdown_callback to set NFSD4_CLIENT_CB_KILL.
echo "expire' > /proc/fs/nfsd/X/ctl does hang since it waits for
cl_rpc_users to drop to 0. But we can fix that by dropping the
wait_event(expiry_wq, atomic_read(&clp->cl_rpc_users) == 0) and
just go ahead and expire_client likes shutdown.
- CPU or memory consumption for each retried callback
Yes, the loop does consume CPU cycles since it's rescheduled to run
after 25ms.
-Dai
That is an ongoing concern.