On Wed, Jun 28, 2023 at 04:35:50PM +0200, Roberto Ragusa wrote: > On 6/28/23 02:03, Theodore Ts'o wrote: > > > Unfortunately, (a) there is no place where the fact that the file > > system was created with this mkfs option is recorded in the > > superblock, and (b) once the file system starts getting used, the > > blocks where the metadata would need to be allocated at the start of > > the disk will get used for directory and data blocks. > > Isn't resize2fs already capable of migrating directory and data blocks > away? According to the comments at the beginning of resize2fs.c, I mean. Yes, but (a) that can only be done off-line (while the file system is unmounted), and (b) migrating directory and data blocks is quite slow and inefficient, and it doesn't necessarily leave the data file in the most optimal way (it didn't do as much as it could to minimize file fragmentation during the mirgation process). It was intended for moving a very small number of blocks, and while it could be improved, that would be additional software engineering investment. > 1. reserve the bitmaps and inode table space since the beginning (with mke2fs > option resize, for example) > 3. do not add new inodes when expanding (impossible by design, right?) This would require file system format changes in the kernel, the kernel on-line resizing code, e2fsck, and the resized2fs for off-line resizing. And while we've considered doing (3) for other reasons, that's not sufficient for this use case, because when we add new block groups, we have to add block and inode allocation bitmaps, the inode table, and the block group descriptor blocks. It's not just the inode table. > 2. push things out of the way when the expansion is done > > I could attempt to code something to do 2., but I would either have to > study resize2fs code, which is not trivial, or write something from scratch, > based only on the layout docs, which would be also complex and not easily > mergeable in resize2fs. > > 4. have an offline way (custom tool, or detecting conflicting files and > temporarily removing them, ...) to free the needed blocks > > At the moment the best option I have is to continue doing what I've been > doing for years already: use dumpe2fs and debugfs to discover which bg > contain metadata+journal and selectively use "pvmove" to migrate > those extents (PE) to the fast PV. Automatable, but still messy. > Discovering "packed_meta_blocks" turned out not a so great finding as I was > hoping, if then you can't resize. Honestly, suspect automating the code to determine which are the block group descriptors, inode table blocks, and allocation bitmap blocks represent the PE's that should be migrated to the fast PV is probably the simplest thing to do. You should be able to do this using just dumpe2fs; the journal is generally not going to move while during a migration. - Ted