Re: Highly reflinked and fragmented considered harmful?

Amir Goldstein <amir73il@xxxxxxxxx> · Tue, 10 May 2022 07:07:35 +0300

On Tue, May 10, 2022 at 2:25 AM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>
> On Mon, May 09, 2022 at 12:46:59PM +1000, Chris Dunlop wrote:
> > Hi,
> >
> > Is it to be expected that removing 29TB of highly reflinked and fragmented
> > data could take days, the entire time blocking other tasks like "rm" and
> > "df" on the same filesystem?
> >
[...]
> > The story...
> >
> > I did an "rm -rf" of a directory containing a "du"-indicated 29TB spread
> > over maybe 50 files. The data would have been highly reflinked and
> > fragmented. A large part of the reflinking would be to files outside the dir
> > in question, and I imagine maybe only 2-3TB of data would actually be freed
> > by the "rm".
>
> But it's still got to clean up 29TB of shared extent references.
> Assuming worst case reflink extent fragmentation of 4kB filesystem
> blocks, 29TB is roughly 7 *billion* references that have to be
> cleaned up.
>
> TANSTAAFL.
>
[...]
>
> IOWs, the problem here is that  you asked the filesystem to perform
> *billions* of update operations by running that rm -rf command and
> your storage simply isn't up to performing such operations.
>
> What reflink giveth you, reflink taketh away.
>

When I read this story, it reads like the filesystem is to blame and
not the user.

First of all, the user did not "ask the filesystem to perform
*billions* of updates",
the user asked the filesystem to remove 50 huge files.

End users do not have to understand how filesystem unlink operation works.
But even if we agree that the user "asked the filesystem to perform *billions*
of updates" (as is the same with rm -rf of billions of files), If the
filesystem says
"ok I'll do it" and hogs the system for 10 days,
there might be something wrong with the system, not with the user.

Linux grew dirty page throttling for the same reason - so we can stop blaming
the users who copied the movie to their USB pen drive for their system getting
stuck.

This incident sounds like a very serious problem - the sort of problem that
makes users leave a filesystem with a door slam, never come back and
start tweeting about how awful fs X is.

And most users won't even try to analyse the situation as Chris did and
write about it to xfs list before starting to tweet.

>From a product POV, I think what should have happened here is that
freeing up the space would have taken 10 days in the background, but
otherwise, filesystem should not have been blocking other processes
for long periods of time.

Of course, it would have been nice if there was a friendly user interface
to notify users of background cg work progress.

All this is much easier said than done, but that does not make it less true.

Can we do anything to throttle background cg work to the point that it
has less catastrophic effect on end users? Perhaps limit the amount of
journal credits allowed to be consumed by gc work? so "foreground"
operations will be less likely to hang?

I am willing to take a swing at it, if you point me at the right direction.

Thanks,
Amir.