> 19 февр. 2021 г., в 16:34, Lukas Czerner <lczerner@xxxxxxxxxx> написал(а): > > On Fri, Feb 19, 2021 at 02:49:16PM +0300, Alexey Lyashkov wrote: >> Lukas, >> >> because e2fsprogs have an bad assumption about IO size for the O_DIRECT case. >> and because library uses a code like >>>> >> set_block_size(1k); >> seek(fs, 1); >> read_block(); >>>>> >> which caused an 1k read inside of 4k disk block size not aligned by block size, which is prohibited and caused an error report. >> >> Reference to patch. >> https://patchwork.ozlabs.org/project/linux-ext4/patch/20201023112659.1559-1-artem.blagodarenko@xxxxxxxxx/ > > Alright, I skimmed through your patch proposal and I am not sure I > completely understand the problem because you have not provided the code > adding O_DIRECT support for e2image. debugfs -D … will hit same problem. > > However I think that it is a reasonable assumption to make that there is > not going to be a file system on a block device such that the fs blocksize > is smaller than device sector size. You can't create such fs with mke2fs > and you can't mount such file system either. > This is don’t need to be create this FS, calling an ext2_open2 is enough. > All that said I can now see that there is a problem in case of mke2fs > and debugfs when used with O_DIRECT (-D) on a file system image with 1k > block size stored on a file in the host file system on the block device > with sector size larger than 1k (...I am getting Inception flashbacks now) if you have open a large (>256T) device with debugfs without -D you will be see a large swap. once this FS want to consume a ~10G for bitmaps and other parts. with cached read you memory consumption increased by two. btw. you can easy replicate it with losetup which able to specify a block size. > > In fact I can confirm that indeed, both mke2fs and debugfs will fail in > such scenario. The question is whether we care enough to support > O_DIRECT in such situations. Personally I don't care enough about this. > However it would be nice to at least have a check (probably in > ext2fs_open2, unix_open_channel or such) and notify user about the > problem. it’s not a tools problem. It’s problem of e2fsprogs library as ext2_open2 affected by this bug. But this is not a single function where bug lives. > > Note that this conversation does not affect my patch since > ext2fs_mmp_read() does not use the unix_io infrastructure. > It’s good to convert to use it. Alex > -Lukas > >> >> Alex >> >>> 19 февр. 2021 г., в 13:57, Lukas Czerner <lczerner@xxxxxxxxxx> написал(а): >>> >>> On Fri, Feb 19, 2021 at 01:08:17PM +0300, Alexey Lyashkov wrote: >>>> Andreas, >>>> >>>> What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ? >>>> Instead of fixing an O_DIRECT access with patches sends early. >>> >>> Why would it not work at all ? This is a fix for a specific problem and >>> I am not currently aware of ony other problems e2fsprogs should have >>> with 4k sector size drives. Do you have a specific problem in mind ? >>> >>> Thanks! >>> -Lukas >>> >>>> >>>> >>>> Alex >>>> >>>>> 19 февр. 2021 г., в 1:20, Andreas Dilger <adilger@xxxxxxxxx> написал(а): >>>>> >>>>> On Feb 18, 2021, at 2:51 AM, Lukas Czerner <lczerner@xxxxxxxxxx> wrote: >>>>>> >>>>>> Currently the mmp block is read using O_DIRECT to avoid any caching that >>>>>> may be done by the VM. However when working with regular files this >>>>>> creates alignment issues when the device of the host file system has >>>>>> sector size larger than the blocksize of the file system in the file >>>>>> we're working with. >>>>>> >>>>>> This can be reproduced with t_mmp_fail test when run on the device with >>>>>> 4k sector size because the mke2fs fails when trying to read the mmp >>>>>> block. >>>>>> >>>>>> Fix it by disabling O_DIRECT when working with regular files. I don't >>>>>> think there is any risk of doing so since the file system layer, unlike >>>>>> shared block device, should guarantee cache consistency. >>>>>> >>>>>> Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx> >>>>>> Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx> >>>>> >>>>> Reviewed-by: Andreas Dilger <adilger@xxxxxxxxx> >>>>> >>>>>> --- >>>>>> v2: Fix comment - it avoids problems when the sector size is larger not >>>>>> smaller than blocksize >>>>>> >>>>>> lib/ext2fs/mmp.c | 22 +++++++++++----------- >>>>>> 1 file changed, 11 insertions(+), 11 deletions(-) >>>>>> >>>>>> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c >>>>>> index c21ae272..cca2873b 100644 >>>>>> --- a/lib/ext2fs/mmp.c >>>>>> +++ b/lib/ext2fs/mmp.c >>>>>> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf) >>>>>> * regardless of how the io_manager is doing reads, to avoid caching of >>>>>> * the MMP block by the io_manager or the VM. It needs to be fresh. */ >>>>>> if (fs->mmp_fd <= 0) { >>>>>> + struct stat st; >>>>>> int flags = O_RDWR | O_DIRECT; >>>>>> >>>>>> -retry: >>>>>> + /* >>>>>> + * There is no reason for using O_DIRECT if we're working with >>>>>> + * regular file. Disabling it also avoids problems with >>>>>> + * alignment when the device of the host file system has sector >>>>>> + * size larger than blocksize of the fs we're working with. >>>>>> + */ >>>>>> + if (stat(fs->device_name, &st) == 0 && >>>>>> + S_ISREG(st.st_mode)) >>>>>> + flags &= ~O_DIRECT; >>>>>> + >>>>>> fs->mmp_fd = open(fs->device_name, flags); >>>>>> if (fs->mmp_fd < 0) { >>>>>> - struct stat st; >>>>>> - >>>>>> - /* Avoid O_DIRECT for filesystem image files if open >>>>>> - * fails, since it breaks when running on tmpfs. */ >>>>>> - if (errno == EINVAL && (flags & O_DIRECT) && >>>>>> - stat(fs->device_name, &st) == 0 && >>>>>> - S_ISREG(st.st_mode)) { >>>>>> - flags &= ~O_DIRECT; >>>>>> - goto retry; >>>>>> - } >>>>>> retval = EXT2_ET_MMP_OPEN_DIRECT; >>>>>> goto out; >>>>>> } >>>>>> -- >>>>>> 2.26.2 >>>>>> >>>>> >>>>> >>>>> Cheers, Andreas >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >> >