Re: Journal under-reservation bug on first >2G file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 30, 2014, at 3:22 PM, Eric Sandeen <sandeen@xxxxxxxxxx> wrote:
> On 9/30/14 4:10 PM, Eric Sandeen wrote:
>> Hey all -
>> 
>> So the following testcase will overrun the 1-credit journal reservation
>> made during a delalloc write in ext4_da_write_begin(), because we
>> may cross the 2G threshold, and need to modify both the inode and the
>> superblock in the same transaction.
>> 
>> I see a few was to fix this:
>> 
>> 1) Always set LARGE_FILE on mount if not set.  This will break
>>   RW compatiblity with very old kernels.  Do we care?
> 
>  1.5) Don't update the feature on the fly - we don't for
>       HUGE_FILE, either.
> 
>  1.5a) Always set the large_file feature with a fresh mkfs, insteadl
>        of relying on the accident of the resize inode being > 2G!

I think that 1.5a is definitely the way to go for new mke2fs, I'm a
bit surprised that we didn't do this for "-t ext4" a long time ago
given that we've enabled lots of other features automatically.

There shouldn't be any problem to do this retroactively in e2fsck
and potentially at mount time for filesystems that already have some
features enabled that are post-large_file (e.g. extents, flex_bg, etc.)
This definitely would not impose any compatibility issues, because any
kernel that supports those features already understands large_file.

I'm pretty sure that e2fsck doesn't turn off large_file automatically
anymore if it can't find any files over 2GB, but it is worthwhile to
verify this.

>> 2) Bump the reservation to 2 under the fiddly condition of
>>   large file not yet set but this write might do it
>> 3) bump the delalloc reservation to 2 just in case, always

Given how many other reservations we have for normal operations,
I don't think it is so bad to reserve an extra block if the
large_file feature isn't enabled yet.  This could be fine tuned
based on the size and offset of the write, but I'm not sure if
the extra complexity warrants it.

It doesn't make sense to reserve this block if the feature
is already set, and I don't think that there are (m)any features
that are turned on automatically by the kernel anymore so it is
overhead to reserve the block if you know it won't be needed.

I don't know if this is belt and suspenders, but it might be
something to consider for supporting older kernels and we may not
need it in newer kernels.

Cheers, Andreas

>> I'll be happy to write the patch to fix it, just wondering what
>> people think the best approach is
>> 
>> Thoughts?
>> -Eric
>> 
>> 
>> #!/bin/bash
>> 
>> # A 400m fs won't get the large_file feature, oddly
>> # enough, because the resize inode will be < 2G.
>> 
>> truncate --size=400m test.img
>> mkfs.ext4 -F test.img
>> # This shouldn't have large_file set, exit if it does for some reason
>> dumpe2fs -h test.img | grep large_file && exit
>> 
>> mkdir -p mnt
>> mount -o loop test.img mnt
>> 
>> echo "writing 1 byte at 2147483646" 
>> dd if=/dev/zero of=mnt/testfile bs=1 seek=2147483646 count=1 conv=notrunc of=mnt/testfile
>> sync
>> 
>> # This will make sure i_disksize is on disk, and
>> # that the buffer will be mapped on the next write.
>> #
>> # This is critical because ext4_da_should_update_i_disksize()
>> # checks buffer_mapped():
>> #
>> #        if (!buffer_mapped(bh) || (buffer_delay(bh)) || buffer_unwritten(bh))
>> #                return 0;
>> #        return 1;
>> 
>> # This tries to update i_disksize, and also requires a superblock
>> # update for the large_file feature flag, but only has 1 credit
>> # available on the delalloc write path
>> 
>> echo "writing 1 byte at 2147483647"
>> dd if=/dev/zero of=mnt/testfile bs=1 seek=2147483647 count=1 conv=notrunc of=mnt/testfile
>> 
>> # Should go boom, but if not, unmount
>> umount mnt
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail


[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux