On Tue, Apr 14, 2020 at 04:19:01AM +0000, Luis Chamberlain wrote: > block devices are refcounted so to ensure once its final user goes away it > can be cleaned up by the lower layers properly. The block device's > request_queue structure is also refcounted, however, if the last > blk_put_queue() is called under atomic context the block layer has > to defer removal. > > By refcounting the block device during the use of blkcg_schedule_throttle(), > we ensure ensure two things: > > 1) the block device remains available during the call > 2) we ensure avoid having to deal with the fact we're using the > request_queue structure in atomic context, since the last > blk_put_queue() will be called upon disk_release(), *after* > our own bdput(). > > This means this code path is *not* going to remove the request_queue > structure, as we are ensuring some later upper layer disk_release() > will be the one to release the request_queue structure for us. > > Cc: Bart Van Assche <bvanassche@xxxxxxx> > Cc: Omar Sandoval <osandov@xxxxxx> > Cc: Hannes Reinecke <hare@xxxxxxxx> > Cc: Nicolai Stange <nstange@xxxxxxx> > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > Cc: yu kuai <yukuai3@xxxxxxxxxx> > Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx> > --- > mm/swapfile.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 6659ab563448..9285ff6030ca 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3753,6 +3753,7 @@ static void free_swap_count_continuations(struct swap_info_struct *si) > void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, > gfp_t gfp_mask) > { > + struct block_device *bdev; > struct swap_info_struct *si, *next; > if (!(gfp_mask & __GFP_IO) || !memcg) > return; > @@ -3771,8 +3772,17 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node, > plist_for_each_entry_safe(si, next, &swap_avail_heads[node], > avail_lists[node]) { > if (si->bdev) { > - blkcg_schedule_throttle(bdev_get_queue(si->bdev), > - true); > + bdev = bdgrab(si->bdev); > + if (!bdev) > + continue; > + /* > + * By adding our own bdgrab() we ensure the queue > + * sticks around until disk_release(), and so we ensure > + * our release of the request_queue does not happen in > + * atomic context. > + */ > + blkcg_schedule_throttle(bdev_get_queue(bdev), true); > + bdput(bdev); I don't understand the atomic part of the comment. How does bdgrab/bdput help us there?