On Wed, Aug 26, 2020 at 08:24:07PM +0800, Ming Lei wrote: > On Wed, Aug 26, 2020 at 01:03:37PM +0100, John Garry wrote: > > On 21/08/2020 03:49, Ming Lei wrote: > > > Hello Bart, > > > > > > On Thu, Aug 20, 2020 at 01:30:38PM -0700, Bart Van Assche wrote: > > > > On 8/20/20 11:03 AM, Ming Lei wrote: > > > > > We can't run allocating driver tag and updating tags->rqs[tag] atomically, > > > > > so stale request may be retrieved from tags->rqs[tag]. More seriously, the > > > > > stale request may have been freed via updating nr_requests or switching > > > > > elevator or other use cases. > > > > > > > > > > It is one long-term issue, and Jianchao previous worked towards using > > > > > static_rqs[] for iterating request, one problem is that it can be hard > > > > > to use when iterating over tagset. > > > > > > > > > > This patchset takes another different approach for fixing the issue: cache > > > > > freed rqs pages and release them until all tags->rqs[] references on these > > > > > pages are gone. > > > > > > > > Hi Ming, > > > > > > > > Is this the only possible solution? Would it e.g. be possible to protect the > > > > code that iterates over all tags with rcu_read_lock() / rcu_read_unlock() and > > > > to free pages that contain request pointers only after an RCU grace period has > > > > expired? > > > > > > That can't work, tags->rqs[] is host-wide, request pool belongs to scheduler tag > > > and it is owned by request queue actually. When one elevator is switched on this > > > request queue or updating nr_requests, the old request pool of this queue is freed, > > > but IOs are still queued from other request queues in this tagset. Elevator switch > > > or updating nr_requests on one request queue shouldn't or can't other request queues > > > in the same tagset. > > > > > > Meantime the reference in tags->rqs[] may stay a bit long, and RCU can't cover this > > > case. > > > > > > Also we can't reset the related tags->rqs[tag] simply somewhere, cause it may > > > race with new driver tag allocation. > > > > How about iterate all tags->rqs[] for all scheduler tags when exiting the > > scheduler, etc, and clear any scheduler requests references, like this: > > > > cmpxchg(&hctx->tags->rqs[tag], scheduler_rq, 0); > > > > So we NULLify any tags->rqs[] entries which contain a scheduler request of > > concern atomically, cleaning up any references. > > Looks this approach can work given cmpxchg() will prevent new store on > this address. Another process may still be reading this to-be-freed request via blk_mq_queue_tag_busy_iter or blk_mq_tagset_busy_iter(), meantime NULLify is done and all requests of this scheduler are freed. Thanks, Ming