Re: [PATCH 0/5] blk-mq: fix use-after-free on stale request

Ming Lei <ming.lei@xxxxxxxxxx> · Fri, 21 Aug 2020 10:49:49 +0800

Hello Bart,

On Thu, Aug 20, 2020 at 01:30:38PM -0700, Bart Van Assche wrote:
> On 8/20/20 11:03 AM, Ming Lei wrote:
> > We can't run allocating driver tag and updating tags->rqs[tag] atomically,
> > so stale request may be retrieved from tags->rqs[tag]. More seriously, the
> > stale request may have been freed via updating nr_requests or switching
> > elevator or other use cases.
> > 
> > It is one long-term issue, and Jianchao previous worked towards using
> > static_rqs[] for iterating request, one problem is that it can be hard
> > to use when iterating over tagset.
> > 
> > This patchset takes another different approach for fixing the issue: cache
> > freed rqs pages and release them until all tags->rqs[] references on these
> > pages are gone.
> 
> Hi Ming,
> 
> Is this the only possible solution? Would it e.g. be possible to protect the
> code that iterates over all tags with rcu_read_lock() / rcu_read_unlock() and
> to free pages that contain request pointers only after an RCU grace period has
> expired?

That can't work, tags->rqs[] is host-wide, request pool belongs to scheduler tag
and it is owned by request queue actually. When one elevator is switched on this
request queue or updating nr_requests, the old request pool of this queue is freed,
but IOs are still queued from other request queues in this tagset. Elevator switch
or updating nr_requests on one request queue shouldn't or can't other request queues
in the same tagset.

Meantime the reference in tags->rqs[] may stay a bit long, and RCU can't cover this
case. 

Also we can't reset the related tags->rqs[tag] simply somewhere, cause it may
race with new driver tag allocation. Or some atomic update is required,
but obviously extra load is introduced in fast path.

> Would that perhaps result in a simpler solution?

No, that doesn't work actually.

This patchset looks complicated, but the idea is very simple. With this
approach, we can extend to support allocating request pool attached to
driver tags dynamically. So far, it is always pre-allocated, and never be
used for normal single queue disks.

Thanks,
Ming