Re: [PATCH 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle()

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Tue, 14 Apr 2020 08:44:47 -0700

On Tue, Apr 14, 2020 at 04:19:01AM +0000, Luis Chamberlain wrote:
> block devices are refcounted so to ensure once its final user goes away it
> can be cleaned up by the lower layers properly. The block device's
> request_queue structure is also refcounted, however, if the last
> blk_put_queue() is called under atomic context the block layer has
> to defer removal.
> 
> By refcounting the block device during the use of blkcg_schedule_throttle(),
> we ensure ensure two things:
> 
> 1) the block device remains available during the call
> 2) we ensure avoid having to deal with the fact we're using the
>    request_queue structure in atomic context, since the last
>    blk_put_queue() will be called upon disk_release(), *after*
>    our own bdput().
> 
> This means this code path is *not* going to remove the request_queue
> structure, as we are ensuring some later upper layer disk_release()
> will be the one to release the request_queue structure for us.
> 
> Cc: Bart Van Assche <bvanassche@xxxxxxx>
> Cc: Omar Sandoval <osandov@xxxxxx>
> Cc: Hannes Reinecke <hare@xxxxxxxx>
> Cc: Nicolai Stange <nstange@xxxxxxx>
> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxxxx>
> Cc: yu kuai <yukuai3@xxxxxxxxxx>
> Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> ---
>  mm/swapfile.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 6659ab563448..9285ff6030ca 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3753,6 +3753,7 @@ static void free_swap_count_continuations(struct swap_info_struct *si)
>  void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
>  				  gfp_t gfp_mask)
>  {
> +	struct block_device *bdev;
>  	struct swap_info_struct *si, *next;
>  	if (!(gfp_mask & __GFP_IO) || !memcg)
>  		return;
> @@ -3771,8 +3772,17 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
>  	plist_for_each_entry_safe(si, next, &swap_avail_heads[node],
>  				  avail_lists[node]) {
>  		if (si->bdev) {
> -			blkcg_schedule_throttle(bdev_get_queue(si->bdev),
> -						true);
> +			bdev = bdgrab(si->bdev);
> +			if (!bdev)
> +				continue;
> +			/*
> +			 * By adding our own bdgrab() we ensure the queue
> +			 * sticks around until disk_release(), and so we ensure
> +			 * our release of the request_queue does not happen in
> +			 * atomic context.
> +			 */
> +			blkcg_schedule_throttle(bdev_get_queue(bdev), true);
> +			bdput(bdev);

I don't understand the atomic part of the comment.  How does
bdgrab/bdput help us there?