On Sat, Dec 21, 2024 at 3:53 PM Rick Macklem <rick.macklem@xxxxxxxxx> wrote: > > On Sat, Dec 21, 2024 at 3:27 PM Rick Macklem <rick.macklem@xxxxxxxxx> wrote: > > > > On Sat, Dec 21, 2024 at 9:34 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > > > > > On 12/20/24 9:16 PM, J David wrote: > > > > Hello, > > > > > > > > On Tue, Dec 17, 2024 at 8:51 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > > >> If they can reproduce > > > >> this issue with an "in tree" file system contained in a recent upstream > > > >> Linux kernel, then we can take a look. (Or you and J. David can give it > > > >> a try). > > > > > > > > Yes, I reproduced this behavior on ext4 with 6.11.5+bpo-amd64 from > > > > Debian backports on completely different hardware. > > > > > > > > Then I set up another NFS server on Arch (running kernel 6.12.4), and > > > > reproduced the issue there as well. > > > > > > > > Then, just to be sure, I went and found the instructions for building > > > > the Linux kernel from source, built and tested both 6.12.6 and > > > > 6.13-rc3 as downloaded directly from www.kernel.org, and the issue > > > > occurs with those as well. > > > > > > Reproducing on v6.13-rc with ext4 is all that was necessary, thank you! > > > > > > > > > > Additionally, I have tested every combination of FreeBSD, Linux and > > > > OpenIndiana as client and server to confirm that FreeBSD client with > > > > Linux server is the only case where this problem occurs. > > > > > > Interesting. > > > > > > > > > > Does this count as reproducing the issue with an "in tree" file system > > > > contained in a recent upstream Linux kernel? I'm asking sincerely; I'm > > > > so far out of my depth that I'm pretty sure there are sea monsters > > > > swimming around down there. So I can't rule out the possibility that > > > > I've done something wrong either in setup or testing. > > > > > > > > During the course of this, I've gotten the reproduction down to > > > > extracting a 2k tar file and then running "du" on the resulting > > > > directory from the client. Doesn't matter if the file is untarred on > > > > the FreeBSD client, the server, or another client. The tar file > > > > contains a directory with a handful of random Javascript files from > > > > Drupal. As far as I can tell, it has something to do with the number, > > > > size, or names of the files. The Drupal project has three separate > > > > directories all structured like this with the same filenames, but the > > > > file contents vary. The issue occurs with all of them. > > > > > > > > The Linux /etc/exports file is just: > > > > > > > > /data 192.168.201.0/24(rw,sync) > > > > > > > > (The production case also uses crossmnt and no_subtree_check, anonuid, > > > > and anongid, but I eliminated those one by one to make sure they > > > > weren't responsible.) > > > > > > > > The corresponding fstab entry on the FreeBSD 14.2-RELEASE client is: > > > > > > > > 192.168.201.200:/data /data nfs rw,tcp,nfsv4,minorversion=2 0 0 > > > > > > Out of curiosity, do you see the problem recur with nfsv3 or the other > > > NFSv4 minor versions? > > > > > > > > > > One additional thing I noticed that really blew my mind is that I can > > > > shutdown both the client and the server, wait, power them back on, and > > > > the issue is still there. So it's not something in RAM. That prompted > > > > me to try "touch x" in the directory to create a new 0-length file. > > > > The issue then goes away. Then I can "rm x" and the issue comes back. > > > > By contrast, I can write megabytes from /dev/random into one of the > > > > files without affecting anything; the issue stays the same. > > > > > > > > I then tried it with all empty files using the same filenames. The > > > > issue still occurred. Add or remove one file and the issue goes away. > > > > I then renamed one of the files to zz.js. Issue still occurs. Renamed > > > > it to zzz.js. Problem still occurs. Kept going until I got to > > > > zzzzzz.js and it worked. > > > > > > > > Finally, I got it to the point where running this in an empty mounted > > > > directory will create the issue: > > > > > > > > rm *.xx; for a in a b c d e f g h ; do for b in 1 2 3 4 5 6 7 ; do > > > > touch $a$b.xx ; done; done; for a in 1 2 3 4 5; do touch x$a-xx.xx; > > > > done; touch y0-xxxxxx.xx > > > > > > > > and this will not: > > > > > > > > rm *.xx; for a in a b c d e f g h ; do for b in 1 2 3 4 5 6 7 ; do > > > > touch $a$b.xx ; done; done; for a in 1 2 3 4 5; do touch x$a-xx.xx; > > > > done; touch y0-xxxxxxx.xx > > > > > > > > (The difference being one extra x in the last filename.) > > > > > > > > It works in the other direction as well. This causes the issue: > > > > > > > > rm *.xx; for a in a b c d e f g h ; do for b in 1 2 3 4 5 6 7 ; do > > > > touch $a$b.xx ; done; done; for a in 1 2 3 4 5; do touch x$a-xx.xx; > > > > done; touch y0-xxx.xx > > > > > > > > This does not: > > > > > > > > rm *.xx; for a in a b c d e f g h ; do for b in 1 2 3 4 5 6 7 ; do > > > > touch $a$b.xx ; done; done; for a in 1 2 3 4 5; do touch x$a-xx.xx; > > > > done; touch y0-xx.xx > > > > > > > > There's a four-character window involving the length of the filenames > > > > where 62 files in a directory causes this issue. There's a little more > > > > to it than that; it doesn't look like you can just create 61 > > > > two-letter filenames and then one really long one and get the issue. > > > > > > > > So I haven't found the specifics yet, but perhaps due to pure chance > > > > this directory structure is exactly right to provoke an incredibly > > > > obscure edge case? > > > > > > Well it's likely that this is a problem with READDIR, so file content > > > is not going to be an issue. The file name lengths are the problem. > > > > > > Also, I'm wondering what the FreeBSD client's directory readdir > > > arguments are (how much does it request, what are the maximum limits it > > > negotiates, and so on). Rick? > > As you'll see in the packet trace: > > Sequence: cache this: No > > Putfh: directory fh > > Readdir: > > cookie: 0 > > cookie_verf: 0 > > dircount: 8706 > > maxcount: 8706 > > attr: type, RDattr_error, fileid, mounted_on_fleid > > Getattr: same attributes as requested for a previous GETATTR, mainly > > to keep the directory's attribute cache up to date. > > > > The session negotiates a max request/reply size of just over 1Mbyte and a > > maximum of something like 20 ops. (Can't recall, but definitely more than 4.) > > > > If you are wondering where the 8706 comes from, it was an estimate of how > > much would be needed to fill an 8K buffer with the XDR translated to UFS dirents > > by adding 512 to 8K. > > > > I have not yet had a chance to see if I can reproduce the problem with > > J. David's > > reproducer. I will try that soon, and if I can reproduce it, I will > > poke at it to try and > > figure out what is going on. > Just fyi, I have reproduced it. Once you use J. David's little shell script to > create the files in the directory, the Readdir RPC gets the junk reply > to GETATTR > (the count of words for the attribute bitmap in the reply is 0 instead of 2). > You can unmount/remount it and still get the failure, assuming you do not > mess with the directory contents. > > Good work finding the reproducer, J. David! > > I will start to poke around to see if I can figure out what the knfsd server is > doing. > > Chuck, I suspect any fairly recent FreeBSD client will be sufficient to > reproduce this, just in case you are inspired to cross over to the dark > side and install FreeBSD somewhere. > > I'll post when I have more info, rick Here's a little more info... (A) NFSv4.0 with the following ops in a compound works (as J. David noted) PUTFH READDIR GETATTR (B) NFSv4.1/4.2 with the following ops in a compound works SEQUENCE PUTFH READDIR (C) NFSv4.1/4.2 with the following ops in a compound does not work SEQUENCE PUTFH READDIR GETATTR Note that all I did for (B) was remove the GETATTR from the compound and that (A) uses the same PUTFH, READDIR and GETATTR as (C). Btw J. David, the patch I sent you that removes GETATTR from the RPC does seem to be a workaround. rick > > > > > rick > > > > > > > > Since this isn't reproducible (yet) with a Linux client, let's try > > > another set of network captures, and you can send these to me > > > privately. > > > > > > Start the capture > > > Mount > > > Run one of the reproducers above > > > Unmount > > > Stop the capture > > > > > > I'd like to see one with v6.13-rc3 and ext4 that works as expected, and > > > one with the same configuration that fails. > > > > > > -- > > > Chuck Lever