Re: Highly reflinked and fragmented considered harmful?

Chris Dunlop <chris@xxxxxxxxxxxx> · Tue, 10 May 2022 12:55:41 +1000

Hi Dave,

On Tue, May 10, 2022 at 09:09:18AM +1000, Dave Chinner wrote:
On Mon, May 09, 2022 at 12:46:59PM +1000, Chris Dunlop wrote:
Is it to be expected that removing 29TB of highly reflinked and fragmented
data could take days, the entire time blocking other tasks like "rm" and
"df" on the same filesystem?
...
At some point, you have to pay the price of creating billions of
random fine-grained cross references in tens of TBs of data spread
across weeks and months of production. You don't notice the scale of
the cross-reference because it's taken weeks and months of normal
operations to get there. It's only when you finally have to perform
an operation that needs to iterate all those references that the
scale suddenly becomes apparent. XFS scales to really large numbers
without significant degradation, so people don't notice things like
object counts or cross references until something like this
happens.

I don't think there's much we can do at the filesystem level to help
you at this point - the inode output in the transaction dump above
indicates that you haven't been using extent size hints to limit
fragmentation or extent share/COW sizes, so the damage is already
present and we can't really do anything to fix that up.

Thanks for taking the time to provide a detailed and informative
exposition, it certainly helps me understand what I'm asking of the fs, 
the areas that deserve more attention, and how to approach analyzing the 
situation.

At this point I'm about 3 days from completing copying the data (from a 
snapshot of the troubled fs mounted with 'norecovery') over to a brand new 
fs. Unfortunately the new fs is also rmapbt=1 so I'll go through all the 
copying again (under more controlled circumstances) to get onto a rmapbt=0 
fs (losing the ability to do online repairs whenever that arrives - 
hopefully that won't come back to haunt me).

Out of interest:

- with a reboot/remount, does the log replay continue from where it left
off, or start again?

Sorry, if you provided an answer to this, I didn't understand it.

Basically the question is, if a recovery on mount were going to take 10 
hours, but the box rebooted and fs mounted again at 8 hours, would the 
recovery this time take 2 hours or once again 10 hours?

Cheers,

Chris