Re: ENSOPC on a 10% used disk

Avi Kivity <avi@xxxxxxxxxxxx> · Thu, 7 Feb 2019 12:51:06 +0200

On 05/02/2019 23.48, Dave Chinner wrote:
Hi Avi,

On Wed, Oct 17, 2018 at 10:52:48AM +0300, Avi Kivity wrote:
I have a user running a 1.7TB filesystem with ~10% usage (as shown
by df), getting sporadic ENOSPC errors. The disk is mounted with
inode64 and has a relatively small number of large files. The disk
is a single-member RAID0 array, with 1MB chunk size. There are 32
AGs. Running Linux 4.9.17.

The write load consists of AIO/DIO writes, followed by unlinks of
these files. The writes are non-size-changing (we truncate ahead)
and we use XFS_IOC_FSSETXATTR/XFS_FLAG_EXTSIZE with a hint size of
32MB. The errors happen on commit logs, which have a target size of
32MB (but may exceed it a little).

The errors are sporadic and after restarting the workload they go
away for a few hours to a few days, but then return. During one of
the crashes I used xfs_db to look at fragmentation and saw that most
AGs had free extents of size categories up to 128-255, but a few had
more. I tried xfs_fsr but it did not help.

Is this a known issue? Would upgrading the kernel help?
Long time, I know, but Brian has just made me aware of this commit
from early 2018 that went into 4.16 that might be relevant and so I
thought it best to close the loop:

commit 6d8a45ce29c7d67cc4fc3016dc2a07660c62482a
Author: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
Date:   Fri Jan 19 17:47:36 2018 -0800

     xfs: don't screw up direct writes when freesp is fragmented

     xfs_bmap_btalloc is given a range of file offset blocks that must be
     allocated to some data/attr/cow fork.  If the fork has an extent size
     hint associated with it, the request will be enlarged on both ends to
     try to satisfy the alignment hint.  If free space is fragmentated,
     sometimes we can allocate some blocks but not enough to fulfill any of
     the requested range.  Since bmapi_allocate always trims the new extent
     mapping to match the originally requested range, this results in
     bmapi_write returning zero and no mapping.

     The consequences of this vary -- buffered writes will simply re-call
     bmapi_write until it can satisfy at least one block from the original
     request.  Direct IO overwrites notice nmaps == 0 and return -ENOSPC
     through the dio mechanism out to userspace with the weird result that
     writes fail even when we have enough space because the ENOSPC return
     overrides any partial write status.  For direct CoW writes the situation
     was disastrous because nobody notices us returning an invalid zero-length
     wrong-offset mapping to iomap and the write goes off into space.

     Therefore, if free space is so fragmented that we managed to allocate
     some space but not enough to map into even a single block of the
     original allocation request range, we should break the alignment hint in
     order to guarantee at least some forward progress for the direct write.
     If we return a short allocation to iomap_apply it'll call back about the
     remaining blocks.

     Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
     Reviewed-by: Christoph Hellwig <hch@xxxxxx>

The spurious ENOSPC symptoms seem to match what you are seeing here
on your customer's 4.9 kernel, so it may be that this is the fix for
the ENOSPC problem that was reported. If this comes up again, then
perhaps it would be worth either upgrading the kernel to 4.16+ or
backporting this commit to see if it fixes the problem.

Thanks for remembering. Indeed it looks like a good match for the 
problem. We did not see the problem again (it took quite a combination 
of screwups to achieve), but I'll remember this in case that we do.