Hi everyone, I'm working with several Red Hat derived systems and have noticed an issue with ENOSPC and NFS that I'm looking for some guidance on. First let me describe the testing setup, and then I'll share my results from an EL7 based system (kernel 3.10.0-1160.90.1.el7), an EL8 based system (kernel 4.18.0-425.19.2.el8_7), an EL8 based system patched with commit e6005436f6cc9ed13288f936903f0151e5543485 (kernel 4.18.0-425.19.2.el8_7 plus that commit), and finally an EL8 based system but with an upstream 6.1 kernel. Assume I have a 20M quota on my current working directory which is an NFS export from one of the major enterprise vendors. The testing looks like the following: # rm -f file1 # touch file1 # dd bs=1M count=20 if=/dev/zero of=file2 # this will use all the quota 20+0 records in 20+0 records out 20971520 bytes (21 MB, 20 MiB) copied, 0.193018 s, 109 MB/s # tee -a file1 <<< abc abc tee: file1: No space left on device # rm -f file2 # tee -a file1 <<< abc abc On an EL7 based system, running the above works just as shown. I.e. you create file1, then use all the quota with file2, attempt to write to file1 which fails with ENOSPC (as expected), remove file2 (which frees up the quota), and then attempt to write to file1 again which succeeds. However, on a stock EL8 based system, I instead get the following surprising behavior: # rm -f file1 # touch file1 # dd bs=1M count=20 if=/dev/zero of=file2 # this will use all the quota 20+0 records in 20+0 records out 20971520 bytes (21 MB, 20 MiB) copied, 0.193018 s, 109 MB/s # tee -a file1 <<< abc abc tee: file1: No space left on device # rm -f file2 # tee -a file1 <<< abc abc tee: file1: No space left on device # tee -a file1 <<< abc abc tee: file1: No space left on device I.e. Even after freeing the space by removing file2, writing to file1 continues to fail with ENOSPC forever (I've only shown 2 iterations above) [1]. No amount of waiting will cause it to go away. But, we've found that running sync(1) on the file will fix it (the sync itself will complain with ENOSPC, but then subsequent tee invocatinos succeed). I thought that perhaps the issue was the fact that kernel 4.18.0-425.19.2.el8_7 was missing commit e6005436f6cc9ed13288f936903f0151e5543485 (which adds some ENOSPC handling to `nfs_file_write'), so we patched the kernel with that patch and tested again. It's worth saying that with this patch, the behavior of our 4.18 kernel and the 6.1 kernel are consistent when running this test, but I feel like there might still be a bug here. What I get now is: # rm -f file1 # touch file1 # dd bs=1M count=20 if=/dev/zero of=file2 # this will use all the quota 20+0 records in 20+0 records out 20971520 bytes (21 MB, 20 MiB) copied, 0.193018 s, 109 MB/s # tee -a file1 <<< abc abc tee: file1: No space left on device # rm -f file2 # tee -a file1 <<< abc abc tee: file1: No space left on device # tee -a file1 <<< abc abc I.e. The first attempt to write to the file after freeing the quota fails with ENOSPC, but subsequent attempts succeed. Note that this is different from the original behavior on our EL7 based system as shown above where as soon as the quota is freed up, there are no more ENOSPC errors. I'm no expert, but below I'm including some digging I did in case it's helpful for understanding the situation more fully without needing to reproduce yourselves. If it's not helpful and just distracting, apologies in advance! >From strace'ing and systemtap'ing I noticed that the first call to `tee' (after the quota is used up by file2) returns the ENOSPC in response to close(2) (i.e. via `nfs_file_flush') and the second call to `tee' (after the quota has been freed) returns the ENOSPC in response to the write(2) (i.e. via `nfs_file_write' , and then clears the error via the changes we introduced with commit e6005436f6cc9ed13288f936903f0151e5543485). Here is the output of [2], run while reproducing (the comments and spacing are mine just to more easily be able to tell things apart): # stap -vv --vp 10101 /tmp/t.stp ... # This section is the first tee invocation when the quota is fully consumed by file2 1686164924543967263: tee module("nfs").function("nfs_file_write@fs/nfs/file.c:592").call f_wb_err: wb_err: flags:0x0 1686164924543983309: tee module("nfs").function("nfs_file_write@fs/nfs/file.c:592").return f_wb_err: wb_err: flags:0x0 1686164924543986715: tee module("nfs").function("nfs_file_flush@fs/nfs/file.c:139").call f_wb_err: wb_err: flags:0x0 1686164924545701586: tee module("nfs").function("nfs_file_flush@fs/nfs/file.c:139").return f_wb_err: wb_err:ENOSPC flags:0x0 # This section is the second tee invocation when the quota has been freed 1686164933146193798: tee module("nfs").function("nfs_file_write@fs/nfs/file.c:592").call f_wb_err: wb_err:ENOSPC flags:0x0 1686164933147496167: tee module("nfs").function("nfs_file_write@fs/nfs/file.c:592").return f_wb_err: wb_err: flags:0x0 1686164933147623303: tee module("nfs").function("nfs_file_flush@fs/nfs/file.c:139").call f_wb_err: wb_err: flags:0x0 1686164933147627755: tee module("nfs").function("nfs_file_flush@fs/nfs/file.c:139").return f_wb_err: wb_err: flags:0x0 # This section is the third tee invocation that succeeds with no errors 1686164937358310484: tee module("nfs").function("nfs_file_write@fs/nfs/file.c:592").call f_wb_err: wb_err: flags:0x0 1686164937358323369: tee module("nfs").function("nfs_file_write@fs/nfs/file.c:592").return f_wb_err: wb_err: flags:0x0 1686164937358326649: tee module("nfs").function("nfs_file_flush@fs/nfs/file.c:139").call f_wb_err: wb_err: flags:0x0 1686164937359877744: tee module("nfs").function("nfs_file_flush@fs/nfs/file.c:139").return f_wb_err: wb_err: flags:0x0 I barely know what I'm looking at, but I wondered whether `nfs_file_flush' should be resetting the error when it returns by calling `file_check_and_advance_wb_err' instead of `filemap_check_wb_err'. Given that it's reported the error to user space, is that sufficient to clear the error? Or, does it need to keep the error because there are dirty pages that haven't been written back yet? I guess what I'm asking is, is the above behavior we've observed on EL8 patched with e6005436f6cc9ed13288f936903f0151e5543485 and separately with 6.1 the expected behavior? Or should it behave like it did in EL7? Any insight or direction would be greatly appreciated! Chris [1]: It's not actually the write(2) that is giving the ENOSPC, it's the close(2) and in fact the bytes successfully wind up in the file even though the command appears to fail. [2]: A systemtap script I threw together to look at fields I thought might come into play: function dump(file:long) { printf("%d: %s %70s f_wb_err:%6s wb_err:%6s flags:%#x\n", gettimeofday_ns(), execname(), pp(), errno_str(@cast(file, "struct file")->f_wb_err), errno_str(@cast(file, "struct file")->f_mapping->wb_err), @cast(file, "struct file")->f_mapping->flags); } probe module("nfs").function("nfs_file_write").call { if (!isinstr(execname(), "tee")) next dump(@cast($iocb, "struct kiocb")->ki_filp); } probe module("nfs").function("nfs_file_write").return { if (!isinstr(execname(), "tee")) next dump(@cast(@entry($iocb), "struct kiocb")->ki_filp); } probe module("nfs").function("nfs_file_flush").call { if (!isinstr(execname(), "tee")) next dump($file); } probe module("nfs").function("nfs_file_flush").return { if (!isinstr(execname(), "tee")) next dump(@entry($file)); }