Re: [PATCH v2] mmp: do not use O_DIRECT when working with regular file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> 19 февр. 2021 г., в 16:34, Lukas Czerner <lczerner@xxxxxxxxxx> написал(а):
> 
> On Fri, Feb 19, 2021 at 02:49:16PM +0300, Alexey Lyashkov wrote:
>> Lukas,
>> 
>> because e2fsprogs have an bad assumption about IO size for the O_DIRECT case.
>> and because library uses a code like
>>>> 
>> set_block_size(1k);
>> seek(fs, 1);
>> read_block();
>>>>> 
>> which caused an 1k read inside of 4k disk block size not aligned by block size, which is prohibited and caused an error report.
>> 
>> Reference to patch.
>> https://patchwork.ozlabs.org/project/linux-ext4/patch/20201023112659.1559-1-artem.blagodarenko@xxxxxxxxx/
> 
> Alright, I skimmed through your patch proposal and I am not sure I
> completely understand the problem because you have not provided the code
> adding O_DIRECT support for e2image.

debugfs -D … will hit same problem.

> 
> However I think that it is a reasonable assumption to make that there is
> not going to be a file system on a block device such that the fs blocksize
> is smaller than device sector size. You can't create such fs with mke2fs
> and you can't mount such file system either.
> 
This is don’t need to be create this FS, calling an ext2_open2 is enough.


> All that said I can now see that there is a problem in case of mke2fs
> and debugfs when used with O_DIRECT (-D) on a file system image with 1k
> block size stored on a file in the host file system on the block device
> with sector size larger than 1k (...I am getting Inception flashbacks now)

if you have open a large (>256T) device with debugfs without -D you will be see a large swap.
once this FS want to consume a ~10G for bitmaps and other parts.
with cached read you memory consumption increased by two.

btw.
you can easy replicate it with losetup which able to specify a block size.

> 
> In fact I can confirm that indeed, both mke2fs and debugfs will fail in
> such scenario. The question is whether we care enough to support
> O_DIRECT in such situations. Personally I don't care enough about this.
> However it would be nice to at least have a check (probably in
> ext2fs_open2, unix_open_channel or such) and notify user about the
> problem.

it’s not a tools problem. It’s problem of e2fsprogs library as ext2_open2 affected by this bug.
But this is not a single function where bug lives.


> 
> Note that this conversation does not affect my patch since
> ext2fs_mmp_read() does not use the unix_io infrastructure.
> 
It’s good to convert to use it.

Alex


> -Lukas
> 
>> 
>> Alex
>> 
>>> 19 февр. 2021 г., в 13:57, Lukas Czerner <lczerner@xxxxxxxxxx> написал(а):
>>> 
>>> On Fri, Feb 19, 2021 at 01:08:17PM +0300, Alexey Lyashkov wrote:
>>>> Andreas,
>>>> 
>>>> What about to disable a O_DIRECT global on any block devices in the e2fsprogs library as this don’t work on 4k disk drives at all ?
>>>> Instead of fixing an O_DIRECT access with patches sends early.
>>> 
>>> Why would it not work at all ? This is a fix for a specific problem and
>>> I am not currently aware of ony other problems e2fsprogs should have
>>> with 4k sector size drives. Do you have a specific problem in mind ?
>>> 
>>> Thanks!
>>> -Lukas
>>> 
>>>> 
>>>> 
>>>> Alex
>>>> 
>>>>> 19 февр. 2021 г., в 1:20, Andreas Dilger <adilger@xxxxxxxxx> написал(а):
>>>>> 
>>>>> On Feb 18, 2021, at 2:51 AM, Lukas Czerner <lczerner@xxxxxxxxxx> wrote:
>>>>>> 
>>>>>> Currently the mmp block is read using O_DIRECT to avoid any caching that
>>>>>> may be done by the VM. However when working with regular files this
>>>>>> creates alignment issues when the device of the host file system has
>>>>>> sector size larger than the blocksize of the file system in the file
>>>>>> we're working with.
>>>>>> 
>>>>>> This can be reproduced with t_mmp_fail test when run on the device with
>>>>>> 4k sector size because the mke2fs fails when trying to read the mmp
>>>>>> block.
>>>>>> 
>>>>>> Fix it by disabling O_DIRECT when working with regular files. I don't
>>>>>> think there is any risk of doing so since the file system layer, unlike
>>>>>> shared block device, should guarantee cache consistency.
>>>>>> 
>>>>>> Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx>
>>>>>> Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx>
>>>>> 
>>>>> Reviewed-by: Andreas Dilger <adilger@xxxxxxxxx>
>>>>> 
>>>>>> ---
>>>>>> v2: Fix comment - it avoids problems when the sector size is larger not
>>>>>> smaller than blocksize
>>>>>> 
>>>>>> lib/ext2fs/mmp.c | 22 +++++++++++-----------
>>>>>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>>>>> 
>>>>>> diff --git a/lib/ext2fs/mmp.c b/lib/ext2fs/mmp.c
>>>>>> index c21ae272..cca2873b 100644
>>>>>> --- a/lib/ext2fs/mmp.c
>>>>>> +++ b/lib/ext2fs/mmp.c
>>>>>> @@ -57,21 +57,21 @@ errcode_t ext2fs_mmp_read(ext2_filsys fs, blk64_t mmp_blk, void *buf)
>>>>>> 	 * regardless of how the io_manager is doing reads, to avoid caching of
>>>>>> 	 * the MMP block by the io_manager or the VM.  It needs to be fresh. */
>>>>>> 	if (fs->mmp_fd <= 0) {
>>>>>> +		struct stat st;
>>>>>> 		int flags = O_RDWR | O_DIRECT;
>>>>>> 
>>>>>> -retry:
>>>>>> +		/*
>>>>>> +		 * There is no reason for using O_DIRECT if we're working with
>>>>>> +		 * regular file. Disabling it also avoids problems with
>>>>>> +		 * alignment when the device of the host file system has sector
>>>>>> +		 * size larger than blocksize of the fs we're working with.
>>>>>> +		 */
>>>>>> +		if (stat(fs->device_name, &st) == 0 &&
>>>>>> +		    S_ISREG(st.st_mode))
>>>>>> +			flags &= ~O_DIRECT;
>>>>>> +
>>>>>> 		fs->mmp_fd = open(fs->device_name, flags);
>>>>>> 		if (fs->mmp_fd < 0) {
>>>>>> -			struct stat st;
>>>>>> -
>>>>>> -			/* Avoid O_DIRECT for filesystem image files if open
>>>>>> -			 * fails, since it breaks when running on tmpfs. */
>>>>>> -			if (errno == EINVAL && (flags & O_DIRECT) &&
>>>>>> -			    stat(fs->device_name, &st) == 0 &&
>>>>>> -			    S_ISREG(st.st_mode)) {
>>>>>> -				flags &= ~O_DIRECT;
>>>>>> -				goto retry;
>>>>>> -			}
>>>>>> 			retval = EXT2_ET_MMP_OPEN_DIRECT;
>>>>>> 			goto out;
>>>>>> 		}
>>>>>> --
>>>>>> 2.26.2
>>>>>> 
>>>>> 
>>>>> 
>>>>> Cheers, Andreas
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 





[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux