Hello Bart, On Thu, Aug 20, 2020 at 01:30:38PM -0700, Bart Van Assche wrote: > On 8/20/20 11:03 AM, Ming Lei wrote: > > We can't run allocating driver tag and updating tags->rqs[tag] atomically, > > so stale request may be retrieved from tags->rqs[tag]. More seriously, the > > stale request may have been freed via updating nr_requests or switching > > elevator or other use cases. > > > > It is one long-term issue, and Jianchao previous worked towards using > > static_rqs[] for iterating request, one problem is that it can be hard > > to use when iterating over tagset. > > > > This patchset takes another different approach for fixing the issue: cache > > freed rqs pages and release them until all tags->rqs[] references on these > > pages are gone. > > Hi Ming, > > Is this the only possible solution? Would it e.g. be possible to protect the > code that iterates over all tags with rcu_read_lock() / rcu_read_unlock() and > to free pages that contain request pointers only after an RCU grace period has > expired? That can't work, tags->rqs[] is host-wide, request pool belongs to scheduler tag and it is owned by request queue actually. When one elevator is switched on this request queue or updating nr_requests, the old request pool of this queue is freed, but IOs are still queued from other request queues in this tagset. Elevator switch or updating nr_requests on one request queue shouldn't or can't other request queues in the same tagset. Meantime the reference in tags->rqs[] may stay a bit long, and RCU can't cover this case. Also we can't reset the related tags->rqs[tag] simply somewhere, cause it may race with new driver tag allocation. Or some atomic update is required, but obviously extra load is introduced in fast path. > Would that perhaps result in a simpler solution? No, that doesn't work actually. This patchset looks complicated, but the idea is very simple. With this approach, we can extend to support allocating request pool attached to driver tags dynamically. So far, it is always pre-allocated, and never be used for normal single queue disks. Thanks, Ming