On Thu, Jun 10, 2021 at 1:29 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > On Thu, Jun 10, 2021 at 11:39 AM Christian König > <christian.koenig@xxxxxxx> wrote: > > Am 10.06.21 um 11:29 schrieb Tvrtko Ursulin: > > > On 09/06/2021 22:29, Jason Ekstrand wrote: > > >> Ever since 0eafec6d3244 ("drm/i915: Enable lockless lookup of request > > >> tracking via RCU"), the i915 driver has used SLAB_TYPESAFE_BY_RCU (it > > >> was called SLAB_DESTROY_BY_RCU at the time) in order to allow RCU on > > >> i915_request. As nifty as SLAB_TYPESAFE_BY_RCU may be, it comes with > > >> some serious disclaimers. In particular, objects can get recycled while > > >> RCU readers are still in-flight. This can be ok if everyone who touches > > >> these objects knows about the disclaimers and is careful. However, > > >> because we've chosen to use SLAB_TYPESAFE_BY_RCU for i915_request and > > >> because i915_request contains a dma_fence, we've leaked > > >> SLAB_TYPESAFE_BY_RCU and its whole pile of disclaimers to every driver > > >> in the kernel which may consume a dma_fence. > > > > > > I don't think the part about leaking is true... > > > > > >> We've tried to keep it somewhat contained by doing most of the hard work > > >> to prevent access of recycled objects via dma_fence_get_rcu_safe(). > > >> However, a quick grep of kernel sources says that, of the 30 instances > > >> of dma_fence_get_rcu*, only 11 of them use dma_fence_get_rcu_safe(). > > >> It's likely there bear traps in DRM and related subsystems just waiting > > >> for someone to accidentally step in them. > > > > > > ...because dma_fence_get_rcu_safe apears to be about whether the > > > *pointer* to the fence itself is rcu protected, not about the fence > > > object itself. > > > > Yes, exactly that. > > We do leak, and badly. Any __rcu protected fence pointer where a > shared fence could show up is affected. And the point of dma_fence is > that they're shareable, and we're inventing ever more ways to do so > (sync_file, drm_syncobj, implicit fencing maybe soon with > import/export ioctl on top, in/out fences in CS ioctl, atomic ioctl, > ...). > > So without a full audit anything that uses the following pattern is > probably busted: > > rcu_read_lock(); > fence = rcu_dereference(); > fence = dma_fence_get_rcu(); > rcu_read_lock(); > > /* use the fence now that we acquired a full reference */ > > And I don't mean "you might wait a bit too much" busted, but "this can > lead to loops in the dma_fence dependency chain, resulting in > deadlocks" kind of busted. What's worse, the standard rcu lockless > access pattern is also busted completely: > > rcu_read_lock(); > fence = rcu_derefence(); > /* locklessly check the state of fence */ > rcu_read_unlock(); > > because once you have TYPESAFE_BY_RCU rcu_read_lock doesn't prevent a > use-after-free anymore. The only thing it guarantees is that your > fence pointer keeps pointing at either freed memory, or a fence, but > nothing else. You have to wrap your rcu_derefence and code into a > seqlock of some kind, either a real one like dma_resv, or an > open-coded one like dma_fence_get_rcu_safe uses. And yes the latter is > a specialized seqlock, except it fails to properly document in > comments where all the required barriers are. > > tldr; all the code using dma_fence_get_rcu needs to be assumed to be broken. > > Heck this is fragile and tricky enough that i915 shot its own leg off > routinely (there's a bugfix floating around just now), so not even > internally we're very good at getting this right. > > > > If one has a stable pointer to a fence dma_fence_get_rcu is I think > > > enough to deal with SLAB_TYPESAFE_BY_RCU used by i915_request (as dma > > > fence is a base object there). Unless you found a bug in rq field > > > recycling. But access to the dma fence is all tightly controlled so I > > > don't get what leaks. > > > > > >> This patch series stops us using SLAB_TYPESAFE_BY_RCU for i915_request > > >> and, instead, does an RCU-safe slab free via rcu_call(). This should > > >> let us keep most of the perf benefits of slab allocation while avoiding > > >> the bear traps inherent in SLAB_TYPESAFE_BY_RCU. It then removes > > >> support > > >> for SLAB_TYPESAFE_BY_RCU from dma_fence entirely. > > > > > > According to the rationale behind SLAB_TYPESAFE_BY_RCU traditional RCU > > > freeing can be a lot more costly so I think we need a clear > > > justification on why this change is being considered. > > > > The problem is that SLAB_TYPESAFE_BY_RCU requires that we use a sequence > > counter to make sure that we don't grab the reference to a reallocated > > dma_fence. > > > > Updating the sequence counter every time we add a fence now means two > > additions writes and one additional barrier for an extremely hot path. > > The extra overhead of RCU freeing is completely negligible compared to that. > > > > The good news is that I think if we are just a bit more clever about our > > handle we can both avoid the sequence counter and keep > > SLAB_TYPESAFE_BY_RCU around. > > You still need a seqlock, or something else that's serving as your > seqlock. dma_fence_list behind a single __rcu protected pointer, with > all subsequent fence pointers _not_ being rcu protected (i.e. full > reference, on every change we allocate might work. Which is a very > funny way of implementing something like a seqlock. > > And that only covers dma_resv, you _have_ to do this _everywhere_ in > every driver. Except if you can proof that your __rcu fence pointer > only ever points at your own driver's fences. > > So unless you're volunteering to audit all the drivers, and constantly > re-audit them (because rcu only guaranteeing type-safety but not > actually preventing use-after-free is very unusual in the kernel) just > fixing dma_resv doesn't solve the problem here at all. > > > But this needs more code cleanup and abstracting the sequence counter > > usage in a macro. > > The other thing is that this doesn't even make sense for i915 anymore. > The solution to the "userspace wants to submit bazillion requests" > problem is direct userspace submit. Current hw doesn't have userspace > ringbuffer, but we have a pretty clever trick in the works to make > this possible with current hw, essentially by submitting a CS that > loops on itself, and then inserting batches into this "ring" by > latching a conditional branch in this CS. It's not pretty, but it gets > the job done and outright removes the need for plaid mode throughput > of i915_request dma fences. To put it another way: I'm the guy who reviewed the patch which started this entire TYPESAFE_BY_RCU mess we got ourselves into: commit 0eafec6d3244802d469712682b0f513963c23eff Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Date: Thu Aug 4 16:32:41 2016 +0100 drm/i915: Enable lockless lookup of request tracking via RCU ... Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> Cc: "Goel, Akash" <akash.goel@xxxxxxxxx> Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@xxxxxxxxxxxxxxxxxx Looking back this was a mistake. The innocently labelled DESTROY_BY_RCU tricked me real bad, and we never had any real-world use-case to justify all the danger this brought not just to i915, but to any driver using __rcu protected dma_fence access. It's not worth it. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch