On Fri, Aug 28, 2009 at 3:14 PM, Andreas Dilger<adilger@xxxxxxx> wrote: > On Aug 28, 2009 14:44 -0700, Jiaying Zhang wrote: >> On Fri, Aug 28, 2009 at 12:40 PM, Andreas Dilger<adilger@xxxxxxx> wrote: >> > This isn't really correct, however, because i_blocks also contains >> > non-data blocks (indirect/index, EA, etc) blocks, so even with small >> > files with ACLs i_blocks may always be larger than ia_size >> 9, and >> > for ext2/3 at least this will ALWAYS be true for files > 48kB in size. >> >> I see. I guess we need to use a special flag then. Or is there any >> other suggestions? I also have another question related to this >> problem. Why those fallocated blocks are not marked as preallocated >> blocks that will then be automatically freed in ext4_release_file? > > Because fallocate() means "persistent allocation on disk", not "in memory > preallocation". The "in memory" preallocation already happens in ext4, > and it is released when the inode is cleaned up. Right. Thanks for pointing this out! RFC, here is a patch that Frank and I have been working on. It introduces a new fs flag to mark that the file has been allocated beyond its EOF, as discussed previously in this thread. The flag is cleared in the subsequent vmtruncate or fallocate without KEEPSIZE. It is possible that a vmtruncate may be called unnecessarily in the case that the file is written beyond the allocated size, but I think it is ok to pay this cost to get correctness. --- .pc/fallocate_keepsizse.patch/fs/attr.c 2009-08-28 15:38:46.000000000 -0700 +++ fs/attr.c 2009-08-28 17:01:04.000000000 -0700 @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode, unsigned int ia_valid = attr->ia_valid; if (ia_valid & ATTR_SIZE && - (attr->ia_size != i_size_read(inode)) { + (attr->ia_size != i_size_read(inode) || + (inode->i_flags & FS_KEEPSIZE_FL))) { int error = vmtruncate(inode, attr->ia_size); if (error) return error; --- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c 2009-08-28 15:37:45.000000000 -0700 +++ fs/ext4/extents.c 2009-08-28 17:27:27.000000000 -0700 @@ -3095,7 +3095,13 @@ static void ext4_falloc_update_inode(str i_size_write(inode, new_size); if (new_size > EXT4_I(inode)->i_disksize) ext4_update_i_disksize(inode, new_size); + inode->i_flags &= ~FS_KEEPSIZE_FL; } else { + /* + * Mark that we allocate beyond EOF so the subsequent truncate + * can proceed even if the new size is the same as i_size. + */ + inode->i_flags |= FS_KEEPSIZE_FL; } } --- .pc/fallocate_keepsizse.patch/fs/ext4/inode.c 2009-08-16 14:19:38.000000000 -0700 +++ fs/ext4/inode.c 2009-08-28 16:59:42.000000000 -0700 @@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode) if (!ext4_can_truncate(inode)) return; + inode->i_flags &= ~FS_KEEPSIZE_FL; + if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE; --- .pc/fallocate_keepsizse.patch/include/linux/fs.h 2009-08-28 15:44:27.000000000 -0700 +++ include/linux/fs.h 2009-08-28 17:00:47.000000000 -0700 @@ -343,6 +343,7 @@ struct inodes_stat_t { #define FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define FS_EXTENT_FL 0x00080000 /* Extents */ #define FS_DIRECTIO_FL 0x00100000 /* Use direct i/o */ +#define FS_KEEPSIZE_FL 0x00200000 /* Blocks allocated beyond EOF */ #define FS_RESERVED_FL 0x80000000 /* reserved for ext2 lib */ #define FS_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ Jiaying > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html