Re: [PATCH v3 17/18] ext4: make punch hole code path work with bigalloc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 20 Apr 2013, Jan Kara wrote:

> Date: Sat, 20 Apr 2013 15:42:41 +0200
> From: Jan Kara <jack@xxxxxxx>
> To: Lukas Czerner <lczerner@xxxxxxxxxx>
> Cc: linux-mm@xxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx,
>     linux-fsdevel@xxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v3 17/18] ext4: make punch hole code path work with
>     bigalloc
> 
> On Tue 09-04-13 11:14:26, Lukas Czerner wrote:
> > Currently punch hole is disabled in file systems with bigalloc
> > feature enabled. However the recent changes in punch hole patch should
> > make it easier to support punching holes on bigalloc enabled file
> > systems.
> > 
> > This commit changes partial_cluster handling in ext4_remove_blocks(),
> > ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
> > partial_cluster is unsigned long long type and it makes sure that we
> > will free the partial cluster if all extents has been released from that
> > cluster. However it has been specifically designed only for truncate.
> > 
> > With punch hole we can be freeing just some extents in the cluster
> > leaving the rest untouched. So we have to make sure that we will notice
> > cluster which still has some extents. To do this I've changed
> > partial_cluster to be signed long long type. The only scenario where
> > this could be a problem is when cluster_size == block size, however in
> > that case there would not be any partial clusters so we're safe. For
> > bigger clusters the signed type is enough. Now we use the negative value
> > in partial_cluster to mark such cluster used, hence we know that we must
> > not free it even if all other extents has been freed from such cluster.
> > 
> > This scenario can be described in simple diagram:
> > 
> > |FFF...FF..FF.UUU|
> >  ^----------^
> >   punch hole
> > 
> > . - free space
> > | - cluster boundary
> > F - freed extent
> > U - used extent
> > 
> > Also update respective tracepoints to use signed long long type for
> > partial_cluster.
>   The patch looks OK. You can add:
> Reviewed-by: Jan Kara <jack@xxxxxxx>
> 
>   Just a minor nit - sometimes you use 'signed long long', sometimes 'long
> long int', sometimes just 'long long'. In kernel we tend to always use just
> 'long long' so it would be good to clean that up.

Sure, I can do that.

Thanks!
-Lukas

> 
> 								Honza
> > 
> > Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx>
> > ---
> >  fs/ext4/extents.c           |   69 +++++++++++++++++++++++++++++++-----------
> >  include/trace/events/ext4.h |   25 ++++++++-------
> >  2 files changed, 64 insertions(+), 30 deletions(-)
> > 
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 9023b76..577c4f5 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -2362,7 +2362,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)
> >  
> >  static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> >  			      struct ext4_extent *ex,
> > -			      ext4_fsblk_t *partial_cluster,
> > +			      signed long long *partial_cluster,
> >  			      ext4_lblk_t from, ext4_lblk_t to)
> >  {
> >  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> > @@ -2391,7 +2391,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> >  	 * partial cluster here.
> >  	 */
> >  	pblk = ext4_ext_pblock(ex) + ee_len - 1;
> > -	if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
> > +	if ((*partial_cluster > 0) &&
> > +	    (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
> >  		ext4_free_blocks(handle, inode, NULL,
> >  				 EXT4_C2B(sbi, *partial_cluster),
> >  				 sbi->s_cluster_ratio, flags);
> > @@ -2417,23 +2418,41 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> >  	    && to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
> >  		/* tail removal */
> >  		ext4_lblk_t num;
> > +		unsigned int unaligned;
> >  
> >  		num = le32_to_cpu(ex->ee_block) + ee_len - from;
> >  		pblk = ext4_ext_pblock(ex) + ee_len - num;
> > -		ext_debug("free last %u blocks starting %llu\n", num, pblk);
> > +		/*
> > +		 * Usually we want to free partial cluster at the end of the
> > +		 * extent, except for the situation when the cluster is still
> > +		 * used by any other extent (partial_cluster is negative).
> > +		 */
> > +		if (*partial_cluster < 0 &&
> > +		    -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
> > +			flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
> > +
> > +		ext_debug("free last %u blocks starting %llu partial %lld\n",
> > +			  num, pblk, *partial_cluster);
> >  		ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
> >  		/*
> >  		 * If the block range to be freed didn't start at the
> >  		 * beginning of a cluster, and we removed the entire
> > -		 * extent, save the partial cluster here, since we
> > -		 * might need to delete if we determine that the
> > -		 * truncate operation has removed all of the blocks in
> > -		 * the cluster.
> > +		 * extent and the cluster is not used by any other extent,
> > +		 * save the partial cluster here, since we might need to
> > +		 * delete if we determine that the truncate operation has
> > +		 * removed all of the blocks in the cluster.
> > +		 *
> > +		 * On the other hand, if we did not manage to free the whole
> > +		 * extent, we have to mark the cluster as used (store negative
> > +		 * cluster number in partial_cluster).
> >  		 */
> > -		if (pblk & (sbi->s_cluster_ratio - 1) &&
> > -		    (ee_len == num))
> > +		unaligned = pblk & (sbi->s_cluster_ratio - 1);
> > +		if (unaligned && (ee_len == num) &&
> > +		    (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
> >  			*partial_cluster = EXT4_B2C(sbi, pblk);
> > -		else
> > +		else if (unaligned)
> > +			*partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
> > +		else if (*partial_cluster > 0)
> >  			*partial_cluster = 0;
> >  	} else
> >  		ext4_error(sbi->s_sb, "strange request: removal(2) "
> > @@ -2451,12 +2470,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> >   * @handle: The journal handle
> >   * @inode:  The files inode
> >   * @path:   The path to the leaf
> > + * @partial_cluster: The cluster which we'll have to free if all extents
> > + *                   has been released from it. It gets negative in case
> > + *                   that the cluster is still used.
> >   * @start:  The first block to remove
> >   * @end:   The last block to remove
> >   */
> >  static int
> >  ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> > -		 struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
> > +		 struct ext4_ext_path *path,
> > +		 signed long long *partial_cluster,
> >  		 ext4_lblk_t start, ext4_lblk_t end)
> >  {
> >  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> > @@ -2469,6 +2492,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >  	unsigned short ex_ee_len;
> >  	unsigned uninitialized = 0;
> >  	struct ext4_extent *ex;
> > +	ext4_fsblk_t pblk;
> >  
> >  	/* the header must be checked already in ext4_ext_remove_space() */
> >  	ext_debug("truncate since %u in leaf to %u\n", start, end);
> > @@ -2507,6 +2531,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >  
> >  		/* If this extent is beyond the end of the hole, skip it */
> >  		if (end < ex_ee_block) {
> > +			/*
> > +			 * We're going to skip this extent and move to another,
> > +			 * so if this extent is not cluster aligned we have
> > +			 * to mark the current cluster as used to avoid
> > +			 * accidentally freeing it later on
> > +			 */
> > +			pblk = ext4_ext_pblock(ex);
> > +			if (pblk & (sbi->s_cluster_ratio - 1))
> > +				*partial_cluster =
> > +					-((long long)EXT4_B2C(sbi, pblk));
> >  			ex--;
> >  			ex_ee_block = le32_to_cpu(ex->ee_block);
> >  			ex_ee_len = ext4_ext_get_actual_len(ex);
> > @@ -2582,7 +2616,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >  					sizeof(struct ext4_extent));
> >  			}
> >  			le16_add_cpu(&eh->eh_entries, -1);
> > -		} else
> > +		} else if (*partial_cluster > 0)
> >  			*partial_cluster = 0;
> >  
> >  		err = ext4_ext_dirty(handle, inode, path + depth);
> > @@ -2600,11 +2634,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >  		err = ext4_ext_correct_indexes(handle, inode, path);
> >  
> >  	/*
> > -	 * If there is still a entry in the leaf node, check to see if
> > -	 * it references the partial cluster.  This is the only place
> > -	 * where it could; if it doesn't, we can free the cluster.
> > +	 * Free the partial cluster only if the current extent does not
> > +	 * reference it. Otherwise we might free used cluster.
> >  	 */
> > -	if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
> > +	if (*partial_cluster > 0 &&
> >  	    (EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
> >  	     *partial_cluster)) {
> >  		int flags = EXT4_FREE_BLOCKS_FORGET;
> > @@ -2654,7 +2687,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
> >  	struct super_block *sb = inode->i_sb;
> >  	int depth = ext_depth(inode);
> >  	struct ext4_ext_path *path = NULL;
> > -	ext4_fsblk_t partial_cluster = 0;
> > +	signed long long partial_cluster = 0;
> >  	handle_t *handle;
> >  	int i = 0, err = 0;
> >  
> > @@ -2838,7 +2871,7 @@ again:
> >  	/* If we still have something in the partial cluster and we have removed
> >  	 * even the first extent, then we should free the blocks in the partial
> >  	 * cluster as well. */
> > -	if (partial_cluster && path->p_hdr->eh_entries == 0) {
> > +	if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
> >  		int flags = EXT4_FREE_BLOCKS_FORGET;
> >  
> >  		if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
> > diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> > index c92500c..5028d05 100644
> > --- a/include/trace/events/ext4.h
> > +++ b/include/trace/events/ext4.h
> > @@ -1928,7 +1928,7 @@ TRACE_EVENT(ext4_ext_show_extent,
> >  TRACE_EVENT(ext4_remove_blocks,
> >  	    TP_PROTO(struct inode *inode, struct ext4_extent *ex,
> >  		ext4_lblk_t from, ext4_fsblk_t to,
> > -		ext4_fsblk_t partial_cluster),
> > +		long long int partial_cluster),
> >  
> >  	TP_ARGS(inode, ex, from, to, partial_cluster),
> >  
> > @@ -1937,7 +1937,7 @@ TRACE_EVENT(ext4_remove_blocks,
> >  		__field(	ino_t,		ino	)
> >  		__field(	ext4_lblk_t,	from	)
> >  		__field(	ext4_lblk_t,	to	)
> > -		__field(	ext4_fsblk_t,	partial	)
> > +		__field(	long long int,	partial	)
> >  		__field(	ext4_fsblk_t,	ee_pblk	)
> >  		__field(	ext4_lblk_t,	ee_lblk	)
> >  		__field(	unsigned short,	ee_len	)
> > @@ -1955,7 +1955,7 @@ TRACE_EVENT(ext4_remove_blocks,
> >  	),
> >  
> >  	TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
> > -		  "from %u to %u partial_cluster %u",
> > +		  "from %u to %u partial_cluster %lld",
> >  		  MAJOR(__entry->dev), MINOR(__entry->dev),
> >  		  (unsigned long) __entry->ino,
> >  		  (unsigned) __entry->ee_lblk,
> > @@ -1963,19 +1963,20 @@ TRACE_EVENT(ext4_remove_blocks,
> >  		  (unsigned short) __entry->ee_len,
> >  		  (unsigned) __entry->from,
> >  		  (unsigned) __entry->to,
> > -		  (unsigned) __entry->partial)
> > +		  (long long int) __entry->partial)
> >  );
> >  
> >  TRACE_EVENT(ext4_ext_rm_leaf,
> >  	TP_PROTO(struct inode *inode, ext4_lblk_t start,
> > -		 struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
> > +		 struct ext4_extent *ex,
> > +		 long long int partial_cluster),
> >  
> >  	TP_ARGS(inode, start, ex, partial_cluster),
> >  
> >  	TP_STRUCT__entry(
> >  		__field(	dev_t,		dev	)
> >  		__field(	ino_t,		ino	)
> > -		__field(	ext4_fsblk_t,	partial	)
> > +		__field(	long long int,	partial	)
> >  		__field(	ext4_lblk_t,	start	)
> >  		__field(	ext4_lblk_t,	ee_lblk	)
> >  		__field(	ext4_fsblk_t,	ee_pblk	)
> > @@ -1993,14 +1994,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
> >  	),
> >  
> >  	TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
> > -		  "partial_cluster %u",
> > +		  "partial_cluster %lld",
> >  		  MAJOR(__entry->dev), MINOR(__entry->dev),
> >  		  (unsigned long) __entry->ino,
> >  		  (unsigned) __entry->start,
> >  		  (unsigned) __entry->ee_lblk,
> >  		  (unsigned long long) __entry->ee_pblk,
> >  		  (unsigned short) __entry->ee_len,
> > -		  (unsigned) __entry->partial)
> > +		  (long long int) __entry->partial)
> >  );
> >  
> >  TRACE_EVENT(ext4_ext_rm_idx,
> > @@ -2058,7 +2059,7 @@ TRACE_EVENT(ext4_ext_remove_space,
> >  
> >  TRACE_EVENT(ext4_ext_remove_space_done,
> >  	TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
> > -		 int depth, ext4_lblk_t partial, unsigned short eh_entries),
> > +		 int depth, long long int partial, unsigned short eh_entries),
> >  
> >  	TP_ARGS(inode, start, end, depth, partial, eh_entries),
> >  
> > @@ -2068,7 +2069,7 @@ TRACE_EVENT(ext4_ext_remove_space_done,
> >  		__field(	ext4_lblk_t,	start		)
> >  		__field(	ext4_lblk_t,	end		)
> >  		__field(	int,		depth		)
> > -		__field(	ext4_lblk_t,	partial		)
> > +		__field(	long long int,	partial		)
> >  		__field(	unsigned short,	eh_entries	)
> >  	),
> >  
> > @@ -2082,14 +2083,14 @@ TRACE_EVENT(ext4_ext_remove_space_done,
> >  		__entry->eh_entries	= eh_entries;
> >  	),
> >  
> > -	TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
> > +	TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %lld "
> >  		  "remaining_entries %u",
> >  		  MAJOR(__entry->dev), MINOR(__entry->dev),
> >  		  (unsigned long) __entry->ino,
> >  		  (unsigned) __entry->start,
> >  		  (unsigned) __entry->end,
> >  		  __entry->depth,
> > -		  (unsigned) __entry->partial,
> > +		  (long long int) __entry->partial,
> >  		  (unsigned short) __entry->eh_entries)
> >  );
> >  
> > -- 
> > 1.7.7.6
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux