On Tue, Apr 17, 2018 at 11:58:16AM -0600, Andreas Dilger wrote: > I was just looking at a posting on stackexchange related to mke2fs > and filesystem image fragmentation: > > https://unix.stackexchange.com/questions/287133/why-is-fragmentation-level-so-huge-in-files-that-contain-other-filesystems > > It seems like "mke2fs" is causing issues for creation of filesystems in > preallocated files, as it is discarding the previously-allocated blocks, > even if the "-E nodiscard" option is used. > > # dd if=/dev/zero of=/var/tmp/tt bs=1M count=100 > 100+0 records in > 100+0 records out > 104857600 bytes (105 MB) copied, 0.272006 s, 385 MB/s > # sync > # filefrag -v /var/tmp/tt > Filesystem type is: ef53 > File size of /var/tmp/tt is 104857600 (25600 blocks of 4096 bytes) > ext: logical_offset: physical_offset: length: expected: flags: > 0: 0.. 24575: 1261568.. 1286143: 24576: > 1: 24576.. 25599: 1292288.. 1293311: 1024: 1286144: last,eof > /var/tmp/tt: 2 extents found > # ./misc/mke2fs -E nodiscard /var/tmp/tt > mke2fs 1.43-WIP (15-Mar-2016) > Creating filesystem with 102400 1k blocks and 25688 inodes > Filesystem UUID: 3f8a0ca8-70b8-4801-b834-99dff0a6a642 > Superblock backups stored on blocks: > 8193, 24577, 40961, 57345, 73729 > > Allocating group tables: done > Writing inode tables: done > Writing superblocks and filesystem accounting information: done > > [root@mookie e2fsprogs-git]# filefrag -v /var/tmp/tt > Filesystem type is: ef53 > File size of /var/tmp/tt is 104857600 (25600 blocks of 4096 bytes) > ext: logical_offset: physical_offset: length: expected: flags: > 0: 0.. 65: 1261568.. 1261633: 66: > 1: 127.. 2113: 1261695.. 1263681: 1987: 1261634: > 2: 2175.. 4096: 1263743.. 1265664: 1922: 1263682: > 3: 4158.. 6209: 1265726.. 1267777: 2052: 1265665: > 4: 6271.. 8192: 1267839.. 1269760: 1922: 1267778: > 5: 8254.. 10305: 1269822.. 1271873: 2052: 1269761: > 6: 10367.. 12288: 1271935.. 1273856: 1922: 1271874: > 7: 12350.. 14401: 1273918.. 1275969: 2052: 1273857: > 8: 14463.. 16384: 1276031.. 1277952: 1922: 1275970: > 9: 16446.. 18497: 1278014.. 1280065: 2052: 1277953: > 10: 18559.. 20480: 1280127.. 1282048: 1922: 1280066: > 11: 20542.. 22528: 1282110.. 1284096: 1987: 1282049: > 12: 22590.. 24575: 1284158.. 1286143: 1986: 1284097: > 13: 24576.. 24576: 1292288.. 1292288: 1: 1286144: > 14: 24638.. 25583: 1292350.. 1293295: 946: 1292289: last > /var/tmp/tt: 15 extents found > > > Without "-E nodiscard", mke2fs prints the message "Discarding device blocks: > done" and calls fallocate() (from strace): > fallocate(3, 03, 0, 1048576) = 0 > fallocate(3, 03, 1048576, 103809024) = 0 > which is NOT called in the "-E nodiscard" case, but it somehow is discarding > the allocated blocks anyway during inode table allocation: > > write(1, "Writing inode tables: ", 22) = 22 > write(1, " 0/13", 5) = 5 > write(1, "\10\10\10\10\10", 5) = 5 > fstat(3, {st_mode=S_IFREG|0644, st_size=104857600, ...}) = 0 > fallocate(3, 03, 267264, 252928) = 0 Hmmm, unix_zeroout() really ought to be trying ZERO_RANGE before resorting to PUNCH_HOLE, since ZERO_RANGE fills holes with unwritten extents and converts written to unwritten. (Or at least it does on ext4 and xfs...) Also looking at that function, it's probably time to go back and fix it for block devices since 'the BLKZEROOUT mess' can be avoided by using block device fallocate. --D > fstat(3, {st_mode=S_IFREG|0644, st_size=104857600, ...}) = 0 > fallocate(3, 03, 8655872, 252928) = 0 > fstat(3, {st_mode=S_IFREG|0644, st_size=104857600, ...}) = 0 > fallocate(3, 03, 16780288, 252928) = 0 > : > : > > > Cheers, Andreas > > > > >