Re: [Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence

Daniel Vetter <daniel.vetter@xxxxxxxx> · Thu, 10 Jun 2021 13:53:09 +0200

On Thu, Jun 10, 2021 at 1:29 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> On Thu, Jun 10, 2021 at 11:39 AM Christian König
> <christian.koenig@xxxxxxx> wrote:
> > Am 10.06.21 um 11:29 schrieb Tvrtko Ursulin:
> > > On 09/06/2021 22:29, Jason Ekstrand wrote:
> > >> Ever since 0eafec6d3244 ("drm/i915: Enable lockless lookup of request
> > >> tracking via RCU"), the i915 driver has used SLAB_TYPESAFE_BY_RCU (it
> > >> was called SLAB_DESTROY_BY_RCU at the time) in order to allow RCU on
> > >> i915_request.  As nifty as SLAB_TYPESAFE_BY_RCU may be, it comes with
> > >> some serious disclaimers.  In particular, objects can get recycled while
> > >> RCU readers are still in-flight.  This can be ok if everyone who touches
> > >> these objects knows about the disclaimers and is careful. However,
> > >> because we've chosen to use SLAB_TYPESAFE_BY_RCU for i915_request and
> > >> because i915_request contains a dma_fence, we've leaked
> > >> SLAB_TYPESAFE_BY_RCU and its whole pile of disclaimers to every driver
> > >> in the kernel which may consume a dma_fence.
> > >
> > > I don't think the part about leaking is true...
> > >
> > >> We've tried to keep it somewhat contained by doing most of the hard work
> > >> to prevent access of recycled objects via dma_fence_get_rcu_safe().
> > >> However, a quick grep of kernel sources says that, of the 30 instances
> > >> of dma_fence_get_rcu*, only 11 of them use dma_fence_get_rcu_safe().
> > >> It's likely there bear traps in DRM and related subsystems just waiting
> > >> for someone to accidentally step in them.
> > >
> > > ...because dma_fence_get_rcu_safe apears to be about whether the
> > > *pointer* to the fence itself is rcu protected, not about the fence
> > > object itself.
> >
> > Yes, exactly that.
>
> We do leak, and badly. Any __rcu protected fence pointer where a
> shared fence could show up is affected. And the point of dma_fence is
> that they're shareable, and we're inventing ever more ways to do so
> (sync_file, drm_syncobj, implicit fencing maybe soon with
> import/export ioctl on top, in/out fences in CS ioctl, atomic ioctl,
> ...).
>
> So without a full audit anything that uses the following pattern is
> probably busted:
>
> rcu_read_lock();
> fence = rcu_dereference();
> fence = dma_fence_get_rcu();
> rcu_read_lock();
>
> /* use the fence now that we acquired a full reference */
>
> And I don't mean "you might wait a bit too much" busted, but "this can
> lead to loops in the dma_fence dependency chain, resulting in
> deadlocks" kind of busted. What's worse, the standard rcu lockless
> access pattern is also busted completely:
>
> rcu_read_lock();
> fence = rcu_derefence();
> /* locklessly check the state of fence */
> rcu_read_unlock();
>
> because once you have TYPESAFE_BY_RCU rcu_read_lock doesn't prevent a
> use-after-free anymore. The only thing it guarantees is that your
> fence pointer keeps pointing at either freed memory, or a fence, but
> nothing else. You have to wrap your rcu_derefence and code into a
> seqlock of some kind, either a real one like dma_resv, or an
> open-coded one like dma_fence_get_rcu_safe uses. And yes the latter is
> a specialized seqlock, except it fails to properly document in
> comments where all the required barriers are.
>
> tldr; all the code using dma_fence_get_rcu needs to be assumed to be broken.
>
> Heck this is fragile and tricky enough that i915 shot its own leg off
> routinely (there's a bugfix floating around just now), so not even
> internally we're very good at getting this right.
>
> > > If one has a stable pointer to a fence dma_fence_get_rcu is I think
> > > enough to deal with SLAB_TYPESAFE_BY_RCU used by i915_request (as dma
> > > fence is a base object there). Unless you found a bug in rq field
> > > recycling. But access to the dma fence is all tightly controlled so I
> > > don't get what leaks.
> > >
> > >> This patch series stops us using SLAB_TYPESAFE_BY_RCU for i915_request
> > >> and, instead, does an RCU-safe slab free via rcu_call().  This should
> > >> let us keep most of the perf benefits of slab allocation while avoiding
> > >> the bear traps inherent in SLAB_TYPESAFE_BY_RCU.  It then removes
> > >> support
> > >> for SLAB_TYPESAFE_BY_RCU from dma_fence entirely.
> > >
> > > According to the rationale behind SLAB_TYPESAFE_BY_RCU traditional RCU
> > > freeing can be a lot more costly so I think we need a clear
> > > justification on why this change is being considered.
> >
> > The problem is that SLAB_TYPESAFE_BY_RCU requires that we use a sequence
> > counter to make sure that we don't grab the reference to a reallocated
> > dma_fence.
> >
> > Updating the sequence counter every time we add a fence now means two
> > additions writes and one additional barrier for an extremely hot path.
> > The extra overhead of RCU freeing is completely negligible compared to that.
> >
> > The good news is that I think if we are just a bit more clever about our
> > handle we can both avoid the sequence counter and keep
> > SLAB_TYPESAFE_BY_RCU around.
>
> You still need a seqlock, or something else that's serving as your
> seqlock. dma_fence_list behind a single __rcu protected pointer, with
> all subsequent fence pointers _not_ being rcu protected (i.e. full
> reference, on every change we allocate might work. Which is a very
> funny way of implementing something like a seqlock.
>
> And that only covers dma_resv, you _have_ to do this _everywhere_ in
> every driver. Except if you can proof that your __rcu fence pointer
> only ever points at your own driver's fences.
>
> So unless you're volunteering to audit all the drivers, and constantly
> re-audit them (because rcu only guaranteeing type-safety but not
> actually preventing use-after-free is very unusual in the kernel) just
> fixing dma_resv doesn't solve the problem here at all.
>
> > But this needs more code cleanup and abstracting the sequence counter
> > usage in a macro.
>
> The other thing is that this doesn't even make sense for i915 anymore.
> The solution to the "userspace wants to submit bazillion requests"
> problem is direct userspace submit. Current hw doesn't have userspace
> ringbuffer, but we have a pretty clever trick in the works to make
> this possible with current hw, essentially by submitting a CS that
> loops on itself, and then inserting batches into this "ring" by
> latching a conditional branch in this CS. It's not pretty, but it gets
> the job done and outright removes the need for plaid mode throughput
> of i915_request dma fences.

To put it another way: I'm the guy who reviewed the patch which
started this entire TYPESAFE_BY_RCU mess we got ourselves into:

commit 0eafec6d3244802d469712682b0f513963c23eff
Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Date:   Thu Aug 4 16:32:41 2016 +0100

   drm/i915: Enable lockless lookup of request tracking via RCU

...

   Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
   Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
   Cc: "Goel, Akash" <akash.goel@xxxxxxxxx>
   Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
   Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
   Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx>
   Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@xxxxxxxxxxxxxxxxxx

Looking back this was a mistake. The innocently labelled
DESTROY_BY_RCU tricked me real bad, and we never had any real-world
use-case to justify all the danger this brought not just to i915, but
to any driver using __rcu protected dma_fence access. It's not worth
it.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch