On Tue, May 17, 2011 at 11:13 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Tue, May 17, 2011 at 10:19:05PM -0500, Eric Sandeen wrote: >> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >> > flag and then ftruncate the file to a size larger than the file's i_size, >> > any allocated but unwritten blocks will be freed but the file size is set >> > to the size that ftruncate specifies. >> > >> > Here is a simple test to reproduce the problem: >> > 1. fallocate a 12k size file with KEEP_SIZE flag >> > 2. write the first 4k >> > 3. ftruncate the file to 8k >> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >> > shows the file has only the first written block left. >> >> To be honest I'm not 100% certain what the fiesystem -should- do in this case. >> >> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >> >> # xfs_bmap -vp testfile >> testfile: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 > > Ok, so that's the case for a _truncate up_ from 4k to 8k: > > $ rm /mnt/test/foo > $ xfs_io -f -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 0 > stat.blocks = 24 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 1 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..23]: 9712..9735 0 (9712..9735) 24 10000 > wrote 4096/4096 bytes at offset 0 > 4 KiB, 1 ops; 0.0000 sec (156 MiB/sec and 40000.0000 ops/sec) > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 > 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 > 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 8192 > stat.blocks = 24 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 2 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > > But you get a different result on truncate down: > > $rm /mnt/test/foo > $ xfs_io -f -c "truncate 12k" -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 12288 > stat.blocks = 24 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 1 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..23]: 9584..9607 0 (9584..9607) 24 10000 > wrote 4096/4096 bytes at offset 0 > 4 KiB, 1 ops; 0.0000 sec (217.014 MiB/sec and 55555.5556 ops/sec) > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 > 1: [8..23]: 9592..9607 0 (9592..9607) 16 10000 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 > 1: [8..15]: 9592..9599 0 (9592..9599) 8 10000 > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 8192 > stat.blocks = 16 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 2 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > > IOWs, on XFS a truncate up does not change the preallocation at all, > while a truncate down will _always_ remove preallocation beyond the > new EOF. It's always had this behaviour w.r.t. to truncate(2) and > preallocation beyond EOF. > >> I think this is a different result from ext4, either with or without your patch. >> >> On ext4 I get size 8k, but only the first 4k mapped, as you say. >> >> I don't recall when truncate is supposed to free fallocated blocks, and from what point? > > It's entirely up to the filesystem how it treats blocks beyond EOF > during truncation. XFS frees them on truncate down, because it is > much safer to just truncate away everything beyond the new EOF than > to leave written extents beyond EOF as potential landmines. > > Indeed, that's why calling vmtruncate() as a bad fix. If you have: > > > NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU > ....----+----------+--------+--------+ > A B C D > > Where A = new EOF (N) > A->B = unwritten (U) > B->C = written (W) > C = old EOF (O) > C->D = unwritten (U) > > Then just calling vmtruncate() will leave the blocks in the range > B->C as written blocks. Hence then doing an extending truncate back > out to D will expose stale data rather than zeros in the range > B->C.... Sorry I am a little confused. If I understand correctly, in the situation you described, we call a truncate that causes EOF to change from C to A. On ext4, we should free all of blocks after A. And when we do an extending truncate to D, any blocks beyond A should be treated as unwritten blocks so we should not expose any stale data, right? Jiaying > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html