Re: [PATCH for-rc] RDMA/cma: fix race between addr_handler and resolve_route

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 3 Apr 2020, at 21:36, Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote:
> 
> On Fri, Apr 03, 2020 at 09:07:10PM +0200, Håkon Bugge wrote:
>> 
>> 
>>> On 3 Apr 2020, at 20:57, Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote:
>>> 
>>> On Fri, Apr 03, 2020 at 08:43:28PM +0200, Håkon Bugge wrote:
>>>> A syzkaller test hits a NULL pointer dereference in
>>>> rdma_resolve_route():
>>> 
>>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
>>> 
>>> This commit in 5.7 probably fixes this:
>> 
>> I think it will not. The mutex in 7c11910783a1 ("RDMA/ucma: Put a
>> lock around every call to the rdma_cm layer") will not prevent
>> addr_handler() to run concurrently with rdma_resolve_route(), right?
> 
> Hmm. Perhaps so. But your patch isn't nearly enough if that is the
> case, you've only considered resolve_route, but it could run
> concurrently with *anything*, with the usual problems.

I was about to argue my case, but looking more into the code I tend to agree with you more and more. I thought, at least, that the rdma_foo_bar() functions had atomic checks on the state and bailed out if the state was inappropriate, but that is not the case all the time. 

And the code that transitions the state from FOO to BAR, letting others observe BAR, and then conditionally setting it back to FOO isn't exactly great...

> Plus addr_handler calls rdma_destroy_id().. Oh wow is that ever
> completely screwed up. Sigh.
> 
> Probably the simplest answer is to have ucma fail operations that are
> not permitted while an async_handler is pending. I'm guessing the only
> operation that would be valid is rdma_destroy_id?

Yes, that would take care of this class of errors, I agree. On a real systems (vs. the syzkaller setup), you may receive (almost any) CM packet ay any time, and that concurrency we must handle.

Shall I make a v2 base on next based on this idea, or do you have something coming?

>> And, I also suspect 7c11910783a1 to have major performance
>> impact. But, that's a different story.
> 
> *shrug* I no longer care. The work to fix this in a performant way is
> enormous and nobody wants to do it. 
> 
> Until that time we are taking a 'Big Lock' approach to all concurrancy
> problems with rdma_cm as this code is *completely* broken for
> concurrency.

Have you considered putting the hammer inside the cm_id instead of the context? Would that allow more concurrency and still avoid the 10ish syzkaller bugs?

> Which is why I'm not taking this patch..

At least I made a good explanation of the bug ;-)


Thxs, Håkon





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux