Re: Question on fallocate/ftruncate sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks a lot for the comments and suggestions!

On Sat, Aug 29, 2009 at 7:52 PM, Theodore Tso<tytso@xxxxxxx> wrote:
> On Fri, Aug 28, 2009 at 05:40:54PM -0700, Jiaying Zhang wrote:
>> --- .pc/fallocate_keepsizse.patch/fs/attr.c   2009-08-28 15:38:46.000000000 -0700
>> +++ fs/attr.c 2009-08-28 17:01:04.000000000 -0700
>> @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode,
>>       unsigned int ia_valid = attr->ia_valid;
>>
>>       if (ia_valid & ATTR_SIZE &&
>> -         (attr->ia_size != i_size_read(inode)) {
>> +         (attr->ia_size != i_size_read(inode) ||
>> +          (inode->i_flags & FS_KEEPSIZE_FL))) {
>>               int error = vmtruncate(inode, attr->ia_size);
>>               if (error)
>>                       return error;
>
> Instead of doing this in the generic code, it really should be done in
> ext4_setattr.  Technically speaking, we don't actually need the
> FS_KEEPSIZE_FL to solve this problem; instead we can simply have the
> ext4 code look in the extent tree to see if there are any blocks
> mapped beyond the logical block:
>
>       i_size_read(inode) >> inode->i_sb->s_blocksize_bits

Is it relatively cheap to scan the extent tree? Will this add the
overhead to truncate?

>
> Having a flag as Andreas suggested does help with the issue of e2fsck
> noticing whether or not i_size is incorrect (and should be fixed) or
> the file has been extended.  So keeping having the flag is an OK thing
> to do, but we need to be careful about a particularly subtle
> overloading problem.  The flags FS_*_FL as defined in
> include/linux/fs.h are technically only for in-memory use.  The ext4
> on-disk format flags is EXT4_*_FL, and defined in ext4.h.
>
> The flags were originially defined for use in ext2/3/4, but later on
> other filesystems adopted those flags so that e2fsprogs's chattr and
> lsattr programs could be used for their filesystems as well.  It just
> so happens that for ext2/3/4 the on-disk encoding of those flags in
> the in-memory encoding of those flags in i_flags are the same, but
> that means that the flags need to be defined in both places to avoid
> assignment overlaps.  We also need to be clear whether the flags are
> internal flags for ext4's use only, or flags meant for use by all
> filesystems.  This is why the testing for FS_KEEPSIZE_FL in fs/attr is
> particularly bad, if the flag are going to be set in fs/ext4/extents.c.
>
> It's better to define the flag as EXT4_KEEPSIZE_FL, and to use it as
> EXT4_KEEPSIZE_FL, but make a note of that bitfield position as being
> reserved in include/linux/fs.h.

Here is the modified patch based on your suggestions. I stick with the
KEEPSIZE_FL approach that I think can allow us to handle the special
truncation accordingly during fsck. Other file systems can also re-use
this flag when they want to support fallocate with KEEP_SIZE. As you
suggested, I moved the EXT4_KEEPSIZE_FL checking to ext4_setattr
that now calls vmtruncate if the KEEPSIZE flag is set in the i_flag.
Please let me know what you think about this proposed patch.

Thanks a lot!

Jiaying

--- .pc/fallocate_keepsizse.patch/fs/ext4/extents.c	2009-08-31
12:08:10.000000000 -0700
+++ fs/ext4/extents.c	2009-08-31 12:12:16.000000000 -0700
@@ -3095,7 +3095,13 @@ static void ext4_falloc_update_inode(str
 			i_size_write(inode, new_size);
 		if (new_size > EXT4_I(inode)->i_disksize)
 			ext4_update_i_disksize(inode, new_size);
+		inode->i_flags &= ~EXT4_KEEPSIZE_FL;
 	} else {
+		/*
+		 * Mark that we allocate beyond EOF so the subsequent truncate
+		 * can proceed even if the new size is the same as i_size.
+		 */
+		inode->i_flags |= EXT4_KEEPSIZE_FL;
 	}
 }

--- .pc/fallocate_keepsizse.patch/fs/ext4/inode.c	2009-08-31
12:08:10.000000000 -0700
+++ fs/ext4/inode.c	2009-08-31 12:12:16.000000000 -0700
@@ -3973,6 +3973,8 @@ void ext4_truncate(struct inode *inode)
 	if (!ext4_can_truncate(inode))
 		return;

+	inode->i_flags &= ~EXT4_KEEPSIZE_FL;
+
 	if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC))
 		ei->i_state |= EXT4_STATE_DA_ALLOC_CLOSE;

@@ -4807,7 +4809,9 @@ int ext4_setattr(struct dentry *dentry,
 	}

 	if (S_ISREG(inode->i_mode) &&
-	    attr->ia_valid & ATTR_SIZE && attr->ia_size < inode->i_size) {
+	    attr->ia_valid & ATTR_SIZE &&
+	    (attr->ia_size < inode->i_size ||
+	     (inode->i_flags & EXT4_KEEPSIZE_FL))) {
 		handle_t *handle;

 		handle = ext4_journal_start(inode, 3);
@@ -4838,6 +4842,11 @@ int ext4_setattr(struct dentry *dentry,
 				goto err_out;
 			}
 		}
+		if ((inode->i_flags & EXT4_KEEPSIZE_FL)) {
+			rc = vmtruncate(inode, attr->ia_size);
+			if (rc)
+				goto err_out;
+		}
 	}

 	rc = inode_setattr(inode, attr);
--- .pc/fallocate_keepsizse.patch/include/linux/fs.h	2009-08-31
12:08:10.000000000 -0700
+++ include/linux/fs.h	2009-08-31 12:12:16.000000000 -0700
@@ -343,6 +343,7 @@ struct inodes_stat_t {
 #define FS_TOPDIR_FL			0x00020000 /* Top of directory hierarchies*/
 #define FS_EXTENT_FL			0x00080000 /* Extents */
 #define FS_DIRECTIO_FL			0x00100000 /* Use direct i/o */
+#define FS_KEEPSIZE_FL			0x00200000 /* Blocks allocated beyond EOF */
 #define FS_RESERVED_FL			0x80000000 /* reserved for ext2 lib */

 #define FS_FL_USER_VISIBLE		0x0003DFFF /* User visible flags */
--- .pc/fallocate_keepsizse.patch/fs/ext4/ext4.h	2009-08-31
12:08:10.000000000 -0700
+++ fs/ext4/ext4.h	2009-08-31 12:12:16.000000000 -0700
@@ -235,6 +235,7 @@ struct flex_groups {
 #define EXT4_HUGE_FILE_FL               0x00040000 /* Set to each huge file */
 #define EXT4_EXTENTS_FL			0x00080000 /* Inode uses extents */
 #define EXT4_EXT_MIGRATE		0x00100000 /* Inode is migrating */
+#define EXT4_KEEPSIZE_FL		0x00200000 /* Blocks allocated beyond EOF
(bit reserved in fs.h) */
 #define EXT4_RESERVED_FL		0x80000000 /* reserved for ext4 lib */

 #define EXT4_FL_USER_VISIBLE		0x000BDFFF /* User visible flags */


>
>                                                        - Ted
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux