[LSF/MM/BPF TOPIC] Changing file system resize patterns

"Theodore Ts'o" <tytso@xxxxxxx> · Tue, 1 Mar 2022 01:34:39 -0500

Traditionally, most file systems resize features were used to grow the
file system in relatively large chunks --- for example, when a 10 TB
disk is added to a RAID array.  However, cloud and embedded deployment
use case has changed this.

One such new anti-pattern is an initial root file system which is
relatively small (a few GB) and then it is "inflated" by resizing it
to a very large size, by a thousand times or more in some cases.  This
is not unique to the cloud, although it is quite common there.
(Another place where this anti-pattern is used is in some embedded
systems, where an image is dd'ed onto flash, and then expanded by
resizing the file system the first time the system is booted.)

A second anti-pattern is caused by the fact that most clouds charge
for the bytes that are provisioned for the emulated block device, as
opposed to the amount of space that is actually used (if the block
device was using something like a thin-provisioning scheme, as I
suspect many of them do).  So to optimize costs, many customers will
only resize the file system when it is 99% full, and then only grow it
by a small amount each time.  Unfortunately, this tends to really bad
from file system fragmentation perspective.

For the first anti-pattern, I can think of a number of possible ways
we could mitigate the problem.  One might be to change the defaults of
mkfs so that performance won't be that bad when a tiny file system is
grown significantly (e.g., a larger journal, enabling 64-bit block
numbers for ext4, etc.).  Unfortunately, this would waste a lot of
space for a fixed size file system, such as one that placed on a USB
thumb drive.  So perhaps there should be some standardized way for
mkfs to determine whether the file system is one that is likely to be
grown (e.g., a GCE PD, AWS EBS, Azure Managed Disks) so it can
automatically DTRT?

Another possible solution is some kind of standardized format (perhaps
like qemu-img, but one which is documented) which can be used to
transmit a file system image which can be formated to a large
provisioned size, but which can be transmitted in a sparse, efficient
format, and then allow it to be "inflated" to full size of the block
device.

I can't think of a lot of good solutions for the first second
anti-pattern --- although if anyone has suggestions other than user
education, I'd love to hear suggestions.

Cheers,

							- Ted