> On Aug 6, 2023, at 7:07 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Fri, 04 Aug 2023, NeilBrown wrote: >> On Fri, 04 Aug 2023, Chuck Lever wrote: >>> On Wed, Aug 02, 2023 at 05:34:39PM +1000, NeilBrown wrote: >>>> svc_get_next_xprt() does a lot more than just get an xprt. It also >>>> decides if it needs to sleep, depending not only on the availability of >>>> xprts but also on the need to exit or handle external work. >>>> >>>> So rename it to svc_rqst_wait_for_work() and only do the testing and >>>> waiting. Move all the waiting-related code out of svc_recv() into the >>>> new svc_rqst_wait_for_work(). >>>> >>>> Move the dequeueing code out of svc_get_next_xprt() into svc_recv(). >>>> >>>> Previously svc_xprt_dequeue() would be called twice, once before waiting >>>> and possibly once after. Now instead rqst_should_sleep() is called >>>> twice. Once to decide if waiting is needed, and once to check against >>>> after setting the task state do see if we might have missed a wakeup. >>>> >>>> signed-off-by: NeilBrown <neilb@xxxxxxx> >>> >>> I've tested and applied this one and the previous one to the thread >>> scheduling branch, with a few minor fix-ups. Apologies for how long >>> this took, I'm wrestling with a SATA/block bug on the v6.6 test >>> system that is being very sassy and hard to nail down. >> >> I'm happy that we are making progress and the series is getting improved >> along the way. Good lock with the block bug. >> >>> >>> I need to dive into the backchannel patch next. I'm trying to think >>> of how I want to test that one. >>> >> >> I think lock-grant call backs should be easiest to trigger. >> However I just tried mounting the filesystem twice with nosharecache, >> and the locking that same file on both mounts. I expected one to block >> and hoped I would see the callback when the first lock was dropped. >> However the second lock didn't block! That's a bug. >> I haven't dug very deep yet, but I think the client is getting a >> delegation for the first open (O_RDWR) so the server doesn't see the >> lock. >> Then when the lock arrives on the second mount, there is no conflicting >> lock and the write delegation maybe isn't treated as a conflict? >> >> I'll try to look more early next week. > > The bug appears to be client-side. > When I mount the same filesystem twice using nosharecache the client > only creates a single session. One of the mounts opens the file and > gets a write delegation. The other mount opens the same file but this > doesn't trigger a delegation recall as the server thinks it is the same > client as it is using the same session. But the client is caching the > two mounts separately and not sharing the delegation. > So when the second mount locks the file the server allows it, even > though the first mount thinks it holds a lock thanks to the delegation. > > I think nosharecache needs to use a separate identity and create a > separate session. > > I repeated the test using network namespaces to create a genuinely > separate clients so the server now sees two clients opening the same file > and trying to lock it. I now see both CB_RECALL and CB_NOTIFY_LOCK > callbacks being handled correctly. Thanks for these results. I'll apply this one to the topic branch, but I'd like to get client review/ack for it first. Meanwhile, I've applied the rpc_status patches to nfsd-next, and pulled in the first half of topic-sunrpc-thread-scheduling for v6.6. -- Chuck Lever