Per Ted's request, I've started editing a document on the ext4 wiki: https://ext4.wiki.kernel.org/index.php/Ext4_VM_Images [comments below too] On Fri, Feb 14, 2014 at 06:46:31PM -0500, Theodore Ts'o wrote: > On Fri, Feb 14, 2014 at 03:19:05PM -0500, Jon Bernard wrote: > > Ahh, I see. Here's where this comes from: the particular usecase is > > provisioning of new cloud instances whose root volume is of unknown > > size. The filesystem and its contents are created and bundled > > before-hand into the smallest filesystem possible. The instance is PXE > > booted for provisioning and the root filesystem is then copied onto the > > disk - and then resized to take advantage of the total amount of space. > > > > In order to support very large partitions, the filesystem is created > > with an abnormally large inode table so that large resizes would be > > possible. I traced it to this commit as best I can tell: > > > > https://github.com/openstack/diskimage-builder/commit/fb246a02eb2ed330d3cc37f5795b3ed026aabe07 > > > > I assumed that additional inodes would be allocated along with block > > groups during an online resize, but that commit contradicts my current > > understanding. > > Additional inodes *are* allocated as the file system is grown. > However thought otherwise was wrong. What happens is that there is a > fixed number of inodes per block group. When the file system is > resized, either by growing or shrinking file system, as block groups > are added or removed from the file system, the number of inodes > is also added or removed. > > > I suggested that the filesystem be created during the time of > > provisioning to allow a more optimal on-disk layout, and I believe this > > is being considered now. > > What causes the most damage in terms of a non-optimal data block > layout, installing the file system on a large file system, and then > shrinking the file system to its minimum size use resize2fs -M. There > is so some non-optimality that occurs as the file system gets filled > beyond about 90% full, but that it's not nearly so bad as shrinking > the file system --- which you should avoid at all costs. > > From a performance point of view, the only time you should try to do > an off-line resize2fs shrink is if you are shrinking the file system > by a handful of blocks as part of converting a file system in place to > use LVM or LUKS encryption, and you need to make room for some > metadata blocks at the end of the partition. > > The other thing thing to note is that if you are using a format such > as qcow2, or something like the device-mapper's thin-provisining > (thinkp) scheme, or if you are willing to deal with sparse files, one > approach is to not resize the file system at all. You could just use > a tool like zerofree[1] to zero out all of the unused blocks in the > file system, and then use "/bin/cp --sparse==always" to cause all zero > blocks to be treated as sparse blocks on the destination file. > > [1] http://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/util/zerofree.c I have a zerofree variant that knows how to punch/discard blocks that I'll throw into contrib/ the next time I send out one of my megapatch sets. > This is part of how I maintain my root filesystem that I use in a VM > for testing ext4 changes upstream. After I update to the latest > Debian unstable package updates, install the latest updates from the > xfstests and e2fsprogs git repositories, I then run the following > script which uses the zerofree.c program to compress the qcow2 root > file system image that I use with kvm: > > http://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/compress-rootfs > > > Also, starting with e2fsprogs 1.42.10, there's another way you can These three options (-rap) are available in 1.42.9. Is there a particular reason not to use it before 1.42.10? > efficiently deploy a large file system image by only copying the > blocks which are in use, by using a command like this: > > e2image -rap src_fs dest_fs > > (See also the -c flag as described in e2image's man page if you want > to use this technique to do incremental image-based backups onto a > flash-based backup medium; I was using this for a while to keep two > laptop SSD's root filesystem in sync with one another.) > > So there are lots of ways that you can do what you need, all without > playing games with resize2fs. Perhaps some of them would actually be > better for your use case. Calvin Watson noted on Ted's G+ repost that one can use fstrim in newer versions of QEMU (1.5+?) to punch out unused blocks if the virtual disk is emulated via virtio-scsi. --D > > > > If it turns out to be not terribly complicated and there is not an > > immediate time constraint, I would love to try to help with this or at > > least test patches. > > I will hopefully have a bug fix in the next week or two. > > Cheers, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html