Re: Too many ENOSPC errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 8, 2023 at 4:50 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> On Thu, 2023-06-08 at 13:05 -0400, Chris Perl wrote:
> > Hi everyone,
> >
> > I'm working with several Red Hat derived systems and have noticed an
> > issue with ENOSPC and NFS that I'm looking for some guidance on.
> >
> > First let me describe the testing setup, and then I'll share my
> > results from an EL7 based system (kernel 3.10.0-1160.90.1.el7), an EL8
> > based system (kernel 4.18.0-425.19.2.el8_7), an EL8 based system
> > patched with commit e6005436f6cc9ed13288f936903f0151e5543485 (kernel
> > 4.18.0-425.19.2.el8_7 plus that commit), and finally an EL8 based
> > system but with an upstream 6.1 kernel.
> >
> > Assume I have a 20M quota on my current working directory which is an
> > NFS export from one of the major enterprise vendors.
> >
> > The testing looks like the following:
> >
> > # rm -f file1
> > # touch file1
> > # dd bs=1M count=20 if=/dev/zero of=file2 # this will use all the quota
> > 20+0 records in
> > 20+0 records out
> > 20971520 bytes (21 MB, 20 MiB) copied, 0.193018 s, 109 MB/s
> > # tee -a file1 <<< abc
> > abc
> > tee: file1: No space left on device
> > # rm -f file2
> > # tee -a file1 <<< abc
> > abc
> >
> > On an EL7 based system, running the above works just as shown. I.e.
> > you create file1, then use all the quota with file2, attempt to write
> > to file1 which fails with ENOSPC (as expected), remove file2 (which
> > frees up the quota), and then attempt to write to file1 again which
> > succeeds.
> >
> > However, on a stock EL8 based system, I instead get the following
> > surprising behavior:
> >
> > # rm -f file1
> > # touch file1
> > # dd bs=1M count=20 if=/dev/zero of=file2 # this will use all the quota
> > 20+0 records in
> > 20+0 records out
> > 20971520 bytes (21 MB, 20 MiB) copied, 0.193018 s, 109 MB/s
> > # tee -a file1 <<< abc
> > abc
> > tee: file1: No space left on device
> > # rm -f file2
> > # tee -a file1 <<< abc
> > abc
> > tee: file1: No space left on device
> > # tee -a file1 <<< abc
> > abc
> > tee: file1: No space left on device
> >
> > I.e. Even after freeing the space by removing file2, writing to file1
> > continues to fail with ENOSPC forever (I've only shown 2 iterations
> > above) [1]. No amount of waiting will cause it to go away. But, we've
> > found that running sync(1) on the file will fix it (the sync itself
> > will complain with ENOSPC, but then subsequent tee invocatinos
> > succeed).
> >
> > I thought that perhaps the issue was the fact that kernel
> > 4.18.0-425.19.2.el8_7 was missing commit
> > e6005436f6cc9ed13288f936903f0151e5543485 (which adds some ENOSPC
> > handling to `nfs_file_write'), so we patched the kernel with that
> > patch and tested again. It's worth saying that with this patch, the
> > behavior of our 4.18 kernel and the 6.1 kernel are consistent when
> > running this test, but I feel like there might still be a bug here.
> >
> > What I get now is:
> >
> > # rm -f file1
> > # touch file1
> > # dd bs=1M count=20 if=/dev/zero of=file2 # this will use all the quota
> > 20+0 records in
> > 20+0 records out
> > 20971520 bytes (21 MB, 20 MiB) copied, 0.193018 s, 109 MB/s
> > # tee -a file1 <<< abc
> > abc
> > tee: file1: No space left on device
> > # rm -f file2
> > # tee -a file1 <<< abc
> > abc
> > tee: file1: No space left on device
> > # tee -a file1 <<< abc
> > abc
> >
> > I.e. The first attempt to write to the file after freeing the quota
> > fails with ENOSPC, but subsequent attempts succeed. Note that this is
> > different from the original behavior on our EL7 based system as shown
> > above where as soon as the quota is freed up, there are no more ENOSPC
> > errors.
> >
> > I'm no expert, but below I'm including some digging I did in case it's
> > helpful for understanding the situation more fully without needing to
> > reproduce yourselves. If it's not helpful and just distracting,
> > apologies in advance!
> >
> > From strace'ing and systemtap'ing I noticed that the first call to
> > `tee' (after the quota is used up by file2) returns the ENOSPC in
> > response to close(2) (i.e. via `nfs_file_flush') and the second call
>
> That is (unfortunately) expected behavior. I've argued (mostly
> unsuccessfully) for years that we shouldn't return writeback errors in
> the close() codepath.
>
> No program should rely on looking for those. The only "legit" error on
> close() is -EBADF.
>
> > to `tee' (after the quota has been freed) returns the ENOSPC in
> > response to the write(2) (i.e. via `nfs_file_write' , and then clears
> > the error via the changes we introduced with commit
> > e6005436f6cc9ed13288f936903f0151e5543485).
> >
>
> Looking at nfs_file_write, it's already tracking errors itself during
> the write. Does this patch fix that? Note that I've not tested this --
> YMMV!

Unfortunately that patch doesn't seem to help.

Since we applied commit e6005436f6cc9ed13288f936903f0151e5543485 and
it seemed to improve the situation (from an unbounded number of ENOSPC
errors to only one additional ENOSPC error), I believe that implies
the error we're seeing is coming from `filemap_check_wb_err', not
`generic_write_sync' in `nfs_file_write'.

I'll do some more tracing and see if I can narrow it down a bit more.

> ----------------------8------------------------
>
> [RFC PATCH] nfs: ignore the error from generic_write_sync
>
> In the write codepath, we're only interested in writeback errors that
> occur after the point where the write has started. It's possible though
> that there were previous errors stored in the mapping before the write
> ever began, in which case generic_write_sync will return error.
>
> We already track errors over the part we're interested in, so we can
> safely discard errors from generic_write_sync.
>
> Reported-by: Chris Perl <cperl@xxxxxxxxxxxxxx>
> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> ---
>  fs/nfs/file.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index f0edf5a36237..3ca1ffb1245e 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -673,10 +673,14 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
>                                         iocb->ki_pos - written,
>                                         iocb->ki_pos - 1);
>         }
> -       result = generic_write_sync(iocb, written);
> -       if (result < 0)
> -               return result;
>
> +       /*
> +        * For a write, we're only interested in errors that occur
> +        * after the point where we sample the wb_error. Ignore
> +        * errors from generic_write_sync, which may have occurred
> +        * before that point.
> +        */
> +       generic_write_sync(iocb, written);
>  out:
>         /* Return error values */
>         error = filemap_check_wb_err(file->f_mapping, since);
> --
> 2.40.1
>
>




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux