> On Sep 6, 2019, at 4:47 PM, Jason L Tibbitts III <tibbs@xxxxxxxxxxx> wrote: > >>>>>> "JBF" == J Bruce Fields <bfields@xxxxxxxxxxxx> writes: > > JBF> Those readdir changes were client-side, right? Based on that I'd > JBF> been assuming a client bug, but maybe it'd be worth getting a full > JBF> packet capture of the readdir reply to make sure it's legit. > > I have been working with bcodding on IRC for the past couple of days on > this. Fortunately I was able to come up with way to fill up a directory > in such a way that it will fail with certainty and as a bonus doesn't > include any user data so I can feel OK about sharing packet captures. I > have a capture alongside a kernel trace of the problematic operation in > https://www.math.uh.edu/~tibbs/nfs/. Not that I can particularly tell > anything useful from that, but bcodding says that it seems to point to > some issue in sunrpc. > > And because I can easily reproduce this and I was able to do a bisect: > > 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d is the first bad commit > commit 2c94b8eca1a26cd46010d6e73a23da5f2e93a19d > Author: Chuck Lever <chuck.lever@xxxxxxxxxx> > Date: Mon Feb 11 11:25:41 2019 -0500 > > SUNRPC: Use au_rslack when computing reply buffer size > > au_rslack is significantly smaller than (au_cslack << 2). Using > that value results in smaller receive buffers. In some cases this > eliminates an extra segment in Reply chunks (RPC/RDMA). > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > Signed-off-by: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > :040000 040000 d4d1ce2fbe0035c5bd9df976b8c448df85dcb505 7011a792dfe72ff9cd70d66e45d353f3d7817e3e M net > > But of course, I can't say whether this is the actual bad commit or > whether it just introduced a behavior change which alters the conditions > under which the problem appears. The first place I'd start looking is the XDR constants at the head of fs/nfs/nfs4xdr.c having to do with READDIR. The report of behavior changes with the use of krb5p also makes this commit plausible. > And just to make sure that the blame doesn't lie with the old RHEL7 > kernel, I rsynced over the problematic directory to a machine running > something slightly more modern (5.1.11, which I know I need to update, > but it's already set up to do kerberised NFS) and the same problem > exists, though the directory listing does fail at a different place. > > - J< -- Chuck Lever