I have a user running a 1.7TB filesystem with ~10% usage (as shown by
df), getting sporadic ENOSPC errors. The disk is mounted with inode64
and has a relatively small number of large files. The disk is a
single-member RAID0 array, with 1MB chunk size. There are 32 AGs.
Running Linux 4.9.17.
The write load consists of AIO/DIO writes, followed by unlinks of these
files. The writes are non-size-changing (we truncate ahead) and we use
XFS_IOC_FSSETXATTR/XFS_FLAG_EXTSIZE with a hint size of 32MB. The errors
happen on commit logs, which have a target size of 32MB (but may exceed
it a little).
The errors are sporadic and after restarting the workload they go away
for a few hours to a few days, but then return. During one of the
crashes I used xfs_db to look at fragmentation and saw that most AGs had
free extents of size categories up to 128-255, but a few had more. I
tried xfs_fsr but it did not help.
Is this a known issue? Would upgrading the kernel help?
I'll try to get a metadata dump next time this happens, and I'll be
happy to supply more information.