On Sep 30, 2014, at 3:22 PM, Eric Sandeen <sandeen@xxxxxxxxxx> wrote: > On 9/30/14 4:10 PM, Eric Sandeen wrote: >> Hey all - >> >> So the following testcase will overrun the 1-credit journal reservation >> made during a delalloc write in ext4_da_write_begin(), because we >> may cross the 2G threshold, and need to modify both the inode and the >> superblock in the same transaction. >> >> I see a few was to fix this: >> >> 1) Always set LARGE_FILE on mount if not set. This will break >> RW compatiblity with very old kernels. Do we care? > > 1.5) Don't update the feature on the fly - we don't for > HUGE_FILE, either. > > 1.5a) Always set the large_file feature with a fresh mkfs, insteadl > of relying on the accident of the resize inode being > 2G! I think that 1.5a is definitely the way to go for new mke2fs, I'm a bit surprised that we didn't do this for "-t ext4" a long time ago given that we've enabled lots of other features automatically. There shouldn't be any problem to do this retroactively in e2fsck and potentially at mount time for filesystems that already have some features enabled that are post-large_file (e.g. extents, flex_bg, etc.) This definitely would not impose any compatibility issues, because any kernel that supports those features already understands large_file. I'm pretty sure that e2fsck doesn't turn off large_file automatically anymore if it can't find any files over 2GB, but it is worthwhile to verify this. >> 2) Bump the reservation to 2 under the fiddly condition of >> large file not yet set but this write might do it >> 3) bump the delalloc reservation to 2 just in case, always Given how many other reservations we have for normal operations, I don't think it is so bad to reserve an extra block if the large_file feature isn't enabled yet. This could be fine tuned based on the size and offset of the write, but I'm not sure if the extra complexity warrants it. It doesn't make sense to reserve this block if the feature is already set, and I don't think that there are (m)any features that are turned on automatically by the kernel anymore so it is overhead to reserve the block if you know it won't be needed. I don't know if this is belt and suspenders, but it might be something to consider for supporting older kernels and we may not need it in newer kernels. Cheers, Andreas >> I'll be happy to write the patch to fix it, just wondering what >> people think the best approach is >> >> Thoughts? >> -Eric >> >> >> #!/bin/bash >> >> # A 400m fs won't get the large_file feature, oddly >> # enough, because the resize inode will be < 2G. >> >> truncate --size=400m test.img >> mkfs.ext4 -F test.img >> # This shouldn't have large_file set, exit if it does for some reason >> dumpe2fs -h test.img | grep large_file && exit >> >> mkdir -p mnt >> mount -o loop test.img mnt >> >> echo "writing 1 byte at 2147483646" >> dd if=/dev/zero of=mnt/testfile bs=1 seek=2147483646 count=1 conv=notrunc of=mnt/testfile >> sync >> >> # This will make sure i_disksize is on disk, and >> # that the buffer will be mapped on the next write. >> # >> # This is critical because ext4_da_should_update_i_disksize() >> # checks buffer_mapped(): >> # >> # if (!buffer_mapped(bh) || (buffer_delay(bh)) || buffer_unwritten(bh)) >> # return 0; >> # return 1; >> >> # This tries to update i_disksize, and also requires a superblock >> # update for the large_file feature flag, but only has 1 credit >> # available on the delalloc write path >> >> echo "writing 1 byte at 2147483647" >> dd if=/dev/zero of=mnt/testfile bs=1 seek=2147483647 count=1 conv=notrunc of=mnt/testfile >> >> # Should go boom, but if not, unmount >> umount mnt >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail