On 19/12/2020, 01:03, "Andreas Dilger" <adilger@xxxxxxxxx> wrote: On Nov 19, 2020, at 5:26 AM, Lyashkov, Alexey <alexey.lyashkov@xxxxxxx> wrote: > > Tso, > > This situation hit with modern hdd with 4k block size and e2image changed to use DIRECT IO instead of buffered. > It would be useful to include this patch for e2image as part of this submission, > so that this can be tested. I suspect that O_DIRECT would be useful for other > tools (e.g. e2fsck, debugfs, etc.) since the IO manager would avoid double > buffering the data in both the kernel and userspace. debugfs have a -D option already. As about e2fsck have run in single user and several loops over FS exist. So caching is good to have there. Don't forget - caching permits an readahead works - which is very usefull for the large filesystem open. > e2fsprogs tries to read a super lock on offset 1k and it caused to set FS block size to 1k and second block reading. > (many other places exist, but it simplest). > Are there actually other places where it is doing sub-block-size reads from disk? Many places. bash-3.2$ grep -rn io_channel_set_blksize * | grep SUPERBLOCK lib/ext2fs/undo_io.c:223: io_channel_set_blksize(channel, SUPERBLOCK_OFFSET); lib/ext2fs/undo_io.c:506: io_channel_set_blksize(channel, SUPERBLOCK_OFFSET); lib/ext2fs/closefs.c:201: io_channel_set_blksize(fs->io, SUPERBLOCK_OFFSET); lib/ext2fs/openfs.c:218: io_channel_set_blksize(fs->io, SUPERBLOCK_OFFSET); misc/mke2fs.c:2573: io_channel_set_blksize(channel, SUPERBLOCK_OFFSET); misc/e2undo.c:168: io_channel_set_blksize(channel, SUPERBLOCK_OFFSET); and some places where set_blksize was called with other size different than block device size. In theory we can create an FS with 1K block size, and tools should able to work with it. > It seems simpler to fix the superblock read at open to always read the first 4KB > into a buffer (and to make it easy to extend to 16KB or 64KB if sector sizes get > even larger), then find the superblock within the buffer to decide the blocksize. And make it on many places including an metadata reading in case FS block size is 1k.