Re: [PATCH] block: Revert "let blkcg_gq grab request queue's refcnt"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 01, 2023 at 12:29:41PM +0800, Ming Lei wrote:
> On Wed, Feb 01, 2023 at 09:56:00AM +0800, Ming Lei wrote:
> > On Tue, Jan 31, 2023 at 09:31:36AM -0800, Bart Van Assche wrote:
> > > On 1/30/23 17:52, Ming Lei wrote:
> > > > Hi Bart,
> > > > 
> > > > On Mon, Jan 30, 2023 at 03:22:57PM -0800, Bart Van Assche wrote:
> > > > > Since commit 0a9a25ca7843 ("block: let blkcg_gq grab request queue's
> > > > > refcnt") for many request queues the reference count drops to 1 when
> > > > > the request queue is destroyed instead of to 0. In other words, the
> > > > > request queue is leaked. Fix this by reverting that commit.
> > > > 
> > > > When/where you observe that the reference count drops to 1 instead of 0?
> > > > 
> > > > Do you have kmem leak log?
> > > > 
> > > > Probably, the last drop is in blkg_free_workfn().
> > > 
> > > Hi Ming,
> > > 
> > > The reference count leak was discovered while I was testing my patch series
> > > that adds support for sub-page limits (https://lore.kernel.org/linux-block/20230130212656.876311-1-bvanassche@xxxxxxx/T/#t).
> > > The second patch in that series adds a counter that tracks the number of
> > > queues that need support for limits below the page size
> > > (sub_page_limit_queues). I noticed that without this patch that counter
> > > increases but never decreases. With this patch applied, that counter drops
> > > back to zero after having run a test that needs support for sub-page limits.
> > 
> > I can reproduce the issue by scsi_debug now, but blkg_release() isn't called,
> > so looks like one blkcg_gq lifetime issue since blkcg_exit_disk() is really
> > run.
> 
> The problem is caused by 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()").
> 
> This commit will hold blkg instance until blkcg_rstat_flush() is called,
> and which may be delayed to css_release_work_fn().

The following patch can address the blkg leak issue:

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index cb110fc51940..78f855c34746 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -2034,6 +2034,10 @@ void blk_cgroup_bio_start(struct bio *bio)
 	struct blkg_iostat_set *bis;
 	unsigned long flags;
 
+	/* Root-level stats are sourced from system-wide IO stats */
+	if (!cgroup_parent(blkcg->css.cgroup))
+		return;
+
 	cpu = get_cpu();
 	bis = per_cpu_ptr(bio->bi_blkg->iostat_cpu, cpu);
 	flags = u64_stats_update_begin_irqsave(&bis->sync);

Thanks, 
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux