I sent the first patch in this series the other day, but didn't get any responses. Since then I've had time to follow up on the client-side part of this problem, which eventually also pointed out yet another bug on the server side. There are also a couple of cleanup patches in here too, and a patch to add some tracepoints that I found useful while diagnosing this. With this set on both client and server, I'm now able to run Yongcheng's test for an hour straight with no stuck locks. Jeff Layton (7): lockd: purge resources held on behalf of nlm clients when shutting down lockd: remove 2 unused helper functions lockd: move struct nlm_wait to lockd.h lockd: fix races in client GRANTED_MSG wait logic lockd: server should unlock lock if client rejects the grant nfs: move nfs_fhandle_hash to common include file lockd: add some client-side tracepoints fs/lockd/Makefile | 6 ++- fs/lockd/clntlock.c | 58 +++++++++++--------------- fs/lockd/clntproc.c | 42 ++++++++++++++----- fs/lockd/host.c | 1 + fs/lockd/svclock.c | 21 ++++++++-- fs/lockd/trace.c | 3 ++ fs/lockd/trace.h | 83 +++++++++++++++++++++++++++++++++++++ fs/nfs/internal.h | 15 ------- include/linux/lockd/lockd.h | 29 ++++++------- include/linux/nfs.h | 20 +++++++++ 10 files changed, 200 insertions(+), 78 deletions(-) create mode 100644 fs/lockd/trace.c create mode 100644 fs/lockd/trace.h -- 2.39.2