Re: [PATCH v2] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri 13-10-23 13:52:08, Jingbo Xu wrote:
> The cgwb cleanup routine will try to release the dying cgwb by switching
> the attached inodes.  It fetches the attached inodes from wb->b_attached
> list, omitting the fact that inodes only with dirty timestamps reside in
> wb->b_dirty_time list, which is the case when lazytime is enabled.  This
> causes enormous zombie memory cgroup when lazytime is enabled, as inodes
> with dirty timestamps can not be switched to a live cgwb for a long time.
> 
> It is reasonable not to switch cgwb for inodes with dirty data, as
> otherwise it may break the bandwidth restrictions.  However since the
> writeback of inode metadata is not accounted for, let's also switch
> inodes with dirty timestamps to avoid zombie memory and block cgroups
> when laztytime is enabled.
> 
> Fixs: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
  ^^^ Fixes

> Signed-off-by: Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx>

Otherwise looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

								Honza

> ---
> v2: add comment explaining why switching for inodes with dirty
> timestamps is needed
> 
> v1: https://lore.kernel.org/all/20231011084228.77615-1-jefflexu@xxxxxxxxxxxxxxxxx/
> ---
>  fs/fs-writeback.c | 41 +++++++++++++++++++++++++++++------------
>  1 file changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index c1af01b2c42d..1767493dffda 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -613,6 +613,24 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
>  	kfree(isw);
>  }
>  
> +static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw,
> +				   struct list_head *list, int *nr)
> +{
> +	struct inode *inode;
> +
> +	list_for_each_entry(inode, list, i_io_list) {
> +		if (!inode_prepare_wbs_switch(inode, isw->new_wb))
> +			continue;
> +
> +		isw->inodes[*nr] = inode;
> +		(*nr)++;
> +
> +		if (*nr >= WB_MAX_INODES_PER_ISW - 1)
> +			return true;
> +	}
> +	return false;
> +}
> +
>  /**
>   * cleanup_offline_cgwb - detach associated inodes
>   * @wb: target wb
> @@ -625,7 +643,6 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
>  {
>  	struct cgroup_subsys_state *memcg_css;
>  	struct inode_switch_wbs_context *isw;
> -	struct inode *inode;
>  	int nr;
>  	bool restart = false;
>  
> @@ -647,17 +664,17 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
>  
>  	nr = 0;
>  	spin_lock(&wb->list_lock);
> -	list_for_each_entry(inode, &wb->b_attached, i_io_list) {
> -		if (!inode_prepare_wbs_switch(inode, isw->new_wb))
> -			continue;
> -
> -		isw->inodes[nr++] = inode;
> -
> -		if (nr >= WB_MAX_INODES_PER_ISW - 1) {
> -			restart = true;
> -			break;
> -		}
> -	}
> +	/*
> +	 * In addition to the inodes that have completed writeback, also switch
> +	 * cgwbs for those inodes only with dirty timestamps. Otherwise, those
> +	 * inodes won't be written back for a long time when lazytime is
> +	 * enabled, and thus pinning the dying cgwbs. It won't break the
> +	 * bandwidth restrictions, as writeback of inode metadata is not
> +	 * accounted for.
> +	 */
> +	restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr);
> +	if (!restart)
> +		restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr);
>  	spin_unlock(&wb->list_lock);
>  
>  	/* no attached inodes? bail out */
> -- 
> 2.19.1.6.gb485710b
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux