Re: UAF in blk_add_rq_to_plug()?

Jens Axboe <axboe@xxxxxxxxx> · Mon, 31 Oct 2022 18:06:34 -0600

On 10/31/22 6:03 PM, Jens Axboe wrote:
> On 10/31/22 5:35 PM, Al Viro wrote:
>> On Mon, Oct 31, 2022 at 04:42:11PM -0600, Jens Axboe wrote:
>>> On 10/31/22 4:12 PM, Al Viro wrote:
>>>> static void blk_add_rq_to_plug(struct blk_plug *plug, struct request *rq)
>>>> {
>>>>         struct request *last = rq_list_peek(&plug->mq_list);
>>>>
>>>> Suppose it's not NULL...
>>>>
>>>>         if (!plug->rq_count) {
>>>>                 trace_block_plug(rq->q);
>>>>         } else if (plug->rq_count >= blk_plug_max_rq_count(plug) ||
>>>>                    (!blk_queue_nomerges(rq->q) &&
>>>>                     blk_rq_bytes(last) >= BLK_PLUG_FLUSH_SIZE)) {
>>>> ... and we went here:
>>>>                 blk_mq_flush_plug_list(plug, false);
>>>> All requests, including the one last points to, might get fed ->queue_rq()
>>>> here.  At which point there seems to be nothing to prevent them getting
>>>> completed and freed on another CPU, possibly before we return here.
>>>>
>>>>                 trace_block_plug(rq->q);
>>>>         }
>>>>
>>>>         if (!plug->multiple_queues && last && last->q != rq->q)
>>>> ... and here we dereference last.
>>>>
>>>> Shouldn't we reset last to NULL after the call of blk_mq_flush_plug_list()
>>>> above?
>>>
>>> There's no UAF here as the requests aren't freed. We could clear 'last'
>>> to make the code more explicit, and that would avoid any potential
>>> suboptimal behavior with ->multiple_queues being wrong.
>>
>> Umm...
>> Suppose ->has_elevator is false and so's ->multiple_queues.
>> No ->queue_rqs(), so blk_mq_flush_plug_list() grabs rcu_read_lock() and
>> hit blk_mq_plug_issue_direct().
>> blk_mq_plug_issue_direct() picks the first request off the list
>> and passes it to blk_mq_request_issue_directly(), which passes it
>> to __blk_mq_request_issue_directly().  There we grab a tag and
>> proceed to __blk_mq_issue_directly(), which feeds request to ->queue_rq().
>>
>> What's to stop e.g. worker on another CPU from picking that sucker,
>> completing it and calling blk_mq_end_request() which completes all
>> bio involved and calls blk_mq_free_request()?
>>
>> If all of that manages to happen before blk_mq_flush_plug_list()
>> returns to caller...  Sure, you probably won't hit in on bare
>> metal, but if you are in a KVM and this virtual CPU happens to
>> lose the host timeslice... I've seen considerably more narrow
>> race windows getting hit on such setups.
>>
>> Am I missing something subtle here?  It's been a long time since
>> I've read through that area - as the matter of fact, I'm trying
>> to refresh my memories of the bio_submit()-related code paths
>> at the moment...
> 
> With blk-mq, which all drivers are these days, the request memory is
> statically allocated when the driver is setup. I'm not denying that you
> could very well have issued AND completed the request which 'last'
> points to by the time we dereference it, but that won't be a UAF unless
> the device has also been quiesced and removed in between. Which I guess
> could indeed be a possibility since it is by definition on a different
> queue (->multiple_queues would be true, however, but that's also what
> would be required to reach that far into that statement).
> 
> This is different from the older days of a request being literally freed
> when it completes, which is what I initially reacted to.
> 
> As mentioned in the original reply, I do think we should just clear
> 'last' as you suggest. But it's not something we've seen on the FB fleet
> of servers, even with the majority of hosts running this code (and on
> VMs).

Forgot to ask - do you want to send a patch for that, or do you just
want me to cook one up with a reported-by for you?

-- 
Jens Axboe