On Wed, Feb 01, 2023 at 09:56:00AM +0800, Ming Lei wrote: > On Tue, Jan 31, 2023 at 09:31:36AM -0800, Bart Van Assche wrote: > > On 1/30/23 17:52, Ming Lei wrote: > > > Hi Bart, > > > > > > On Mon, Jan 30, 2023 at 03:22:57PM -0800, Bart Van Assche wrote: > > > > Since commit 0a9a25ca7843 ("block: let blkcg_gq grab request queue's > > > > refcnt") for many request queues the reference count drops to 1 when > > > > the request queue is destroyed instead of to 0. In other words, the > > > > request queue is leaked. Fix this by reverting that commit. > > > > > > When/where you observe that the reference count drops to 1 instead of 0? > > > > > > Do you have kmem leak log? > > > > > > Probably, the last drop is in blkg_free_workfn(). > > > > Hi Ming, > > > > The reference count leak was discovered while I was testing my patch series > > that adds support for sub-page limits (https://lore.kernel.org/linux-block/20230130212656.876311-1-bvanassche@xxxxxxx/T/#t). > > The second patch in that series adds a counter that tracks the number of > > queues that need support for limits below the page size > > (sub_page_limit_queues). I noticed that without this patch that counter > > increases but never decreases. With this patch applied, that counter drops > > back to zero after having run a test that needs support for sub-page limits. > > I can reproduce the issue by scsi_debug now, but blkg_release() isn't called, > so looks like one blkcg_gq lifetime issue since blkcg_exit_disk() is really > run. The problem is caused by 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()"). This commit will hold blkg instance until blkcg_rstat_flush() is called, and which may be delayed to css_release_work_fn(). Thanks, Ming