Rick Macklem wrote: > Trond Myklebust wrote: > > On Mon, 2021-11-08 at 02:27 +0000, Rick Macklem wrote: > > > Trond Myklebust wrote: > > > > On Sun, 2021-11-07 at 00:03 +0000, Rick Macklem wrote: > > > > > Hi, > > > > > > > > > > I ran a simple test using a Linux 5.12 client NFSv4.2 mount > > > > > against a FreeBSD pNFS server, where the DS is out of space > > > > > (intentionally, by creating a large file on it). > > > > > > > > > > I tried to write a file on the Linux NFS client mount and the > > > > > mount point gets "stuck" (will not <ctrl>C nor "umount -f"). > > > > > --> The client is attempting writes against the DS repeatedly, > > > > > with the DS replying NFS4ERR_NOSPC. (Same byte offsets, > > > > > over and over and over again.) > > > > > --> The client is repeatedly sending RPCs with LayoutError in > > > > > them to the MDS, reporting the NFS4ERR_NOSPC. > > > > > > > > > > I'll leave it up to others, but failing the program trying to > > > > > write the file with ENOSPC would seem preferable to the > > > > > "stuck" mount? > > > > > --> Removing the large file from the DS so that the Writes > > > > > can succeed does cause the client to recover. > > > > > > > > > > > > > The client expectation is that the MDS will either remedy the > > > > situation, or it will return an appropriate application-level error > > > > to > > > > the LAYOUTGET. > > > Thanks Trond, that worked fine for NFSv4.2. I tweaked the pNFS server > > > to reply NFS4ERR_NOSPC to LayoutGet and that worked fine. > > > (This is triggered by the LayoutError.) > > > > > > For NFSv4.1, things don't work as well, since there is no LayoutError > > > operation. The LayoutReturn has the NFS4ERR_NOSPC error in it, > > > but that doesn't happen until it finishes (which doesn't happen until > > > I free up space on the DS). > > > > Hmm... The ENOSPC error from the DS should in principle be marking the > > layout for return. You're saying that the return isn't happening? > Not until the end, after I have deleted the large file, so there is space on the > DS for the writes. It is in the same compound as Close. > The packet capture is here, in case you are interested: > https://people.freebsd.org/~rmacklem/linux-ds-out-of-space.pcap > (Taken at the MDS, so it doesn't show the DS RPCs, but they're just > a lot of writes that fail with NFS4ERR_NOSPC until near the end.) > > If you look, you'll see it gets a layout for the entire file first, > then it repeatedly does LayoutGets that are a little weird. > - For 4K only, but always on for an offset that is an exact multiple > of 1Mbyte. > --> Then, once I free up space on the DS, it does the compound > that includes both Close and LayoutReturn (which has the > NFS4ERR_NOSPC error report in it). > > > Does a newer client fix the issue? > This was 5.12. I'll build/test a newer kernel in the next couple of > days and report back (it's an old single core i386, so it takes a while;-). 5.15.1 exhibits the same behaviour. The only difference is that LayoutReturn was in a separate RPC from Close, but still didn't happen until the end, after I free'd up space on the DS and the writes to the DS succeeded. (This time I had delegations enabled, which might be why the LayoutReturn wasn't in the same compound RPC as Close?) rick > rick > > > > But I can live with only 4.2 working well. I can't be bothered > > > endlessly > > > probing the DSs to see if they are out of space. > > > Agreed. Your server should be able to rely on the layout error reports > > from the client (either in LAYOUTERROR or in the LAYOUTRETURN) in order > > to figure out when the DS might be out of space. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx