Re: [RFC PATCH] drm/sched: Fix a UAF on drm_sched_fence::sched

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 4 Sep 2024 11:46:54 +0200
Simona Vetter <simona.vetter@xxxxxxxx> wrote:

> On Wed, Sep 04, 2024 at 09:40:36AM +0200, Christian König wrote:
> > Am 03.09.24 um 10:13 schrieb Simona Vetter:  
> > > [SNIP]  
> > > > > So I think the issue is much, much bigger, and there's more. And the
> > > > > issue is I think a fundamental design issue of dma_fence itself, not
> > > > > individual users.  
> > > > IIRC both Alex and me pointed out this issue on the very first dma_fence
> > > > code and nobody really cared.  
> > > I guess way back then we didn't really sort out any of the hotunplug
> > > issues, and there wasn't any fw ctx schedulers at least on our horizons
> > > yet. Thin excuse, I know ...  
> > 
> > Well it's just when you have a bee string and a broken leg, what do you
> > attend first? :)  
> 
> Yeah ...
> 
> > > > >    I think at the core it's two constraints:
> > > > > 
> > > > > - dma_fence can stick around practically forever in varios container
> > > > >     objects. We only garbage collect when someone looks, and not even then
> > > > >     consistently.
> > > > > 
> > > > > - fences are meant to be cheap, so they do not have the big refcount going
> > > > >     on like other shared objects like dma_buf
> > > > > 
> > > > > Specifically there's also no refcounting on the module itself with the  
> > > > > ->owner and try_module_get stuff. So even if we fix all these issues on  
> > > > > the data structure lifetime side of things, you might still oops calling
> > > > > into dma_fence->ops->release.
> > > > > 
> > > > > Oops.  
> > > > Yes, exactly that. I'm a bit surprised that you realize that only now :)
> > > > 
> > > > We have the issue for at least 10 years or so and it pops up every now and
> > > > then on my desk because people complain that unloading amdgpu crashes.  
> > > Yeah I knew about the issue. The new idea that popped into my mind is that
> > > I think we cannot plug this properly unless we do it in dma_fence.c for
> > > everyone, and essentially reshape the lifetime rules for that from yolo
> > > to something actually well-defined.
> > > 
> > > Kinda similar work to how dma_resv locking rules and fence book-keeping
> > > were unified to something that actually works across drivers ...  
> > 
> > Well sounds like I've just got more items on my TODO list.
> > 
> > I have patches waiting to be send out going into this direction anyway, will
> > try to get them out by the end of the week and then we can discuss what's
> > still missing.  
> 
> Quick addition, another motivator from the panthor userspace submit
> discussion: If the preempt ctx fence concept spreads, that's another
> non-drm_sched fence that drivers will need and are pretty much guaranteed
> to get wrong.
> 
> Also maybe Boris volunteers to help out with some of the work here?

Sure, I can review/test what Christian comes up with, since he already
seems to have a draft for the new implementation.




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux