Re: Mysterious ENOSPC

Chris Dunlop <chris@xxxxxxxxxxxx> · Fri, 17 Sep 2021 16:07:38 +1000

On Thu, Sep 02, 2021 at 11:42:06AM +1000, Dave Chinner wrote:
On Mon, Aug 30, 2021 at 08:04:57AM +1000, Dave Chinner wrote:
FWIW, if you are using reflink heavily and you have rmap enabled (as
you have), there's every chance that an AG has completely run out of
space and so new rmap records for shared extents can't be allocated
- that can give you spurious ENOSPC errors before the filesystem is
100% full, too.

i.e. every shared extent in the filesystem has a rmap record
pointing back to each owner of the shared extent. That means for an
extent shared 1000 times, there are 1000 rmap records for that
shared extent. If you share it again, a new rmap record needs to be
inserted into the rmapbt, and if the AG is completely out of space
this can fail w/ ENOSPC. Hence you can get ENOSPC errors attempting
to shared or unshare extents because there isn't space in the AG for
the tracking metadata for the new extent record....
...
Ok, now I've seen the filesystem layout, I can say that the
preconditions for per-ag ENOSPC conditions do actually exist. Hence
we now really need to know what operation is reporting ENOSPC. I
guess we'll just have to wait for that to occur again and hope your
scripts capture it.

FYI, "something" seems to have changed without any particular prompting 
and there haven't been any ENOSPC events in the last 3 weeks whereas 
previously they were occurring 4-5 times a week. Sigh.

Cheers,

Chris