Re: [Intel-gfx] [PATCH v3] drm/vgem: Attach sw fences to exported vGEM dma-buf (ioctl)

Daniel Vetter <daniel@xxxxxxxx> · Thu, 14 Jul 2016 14:40:59 +0200

On Thu, Jul 14, 2016 at 11:11:02AM +0100, Chris Wilson wrote:
> On Thu, Jul 14, 2016 at 10:59:04AM +0100, Chris Wilson wrote:
> > On Thu, Jul 14, 2016 at 10:12:17AM +0200, Daniel Vetter wrote:
> > > On Thu, Jul 14, 2016 at 08:04:19AM +0100, Chris Wilson wrote:
> > > > vGEM buffers are useful for passing data between software clients and
> > > > hardware renders. By allowing the user to create and attach fences to
> > > > the exported vGEM buffers (on the dma-buf), the user can implement a
> > > > deferred renderer and queue hardware operations like flipping and then
> > > > signal the buffer readiness (i.e. this allows the user to schedule
> > > > operations out-of-order, but have them complete in-order).
> > > > 
> > > > This also makes it much easier to write tightly controlled testcases for
> > > > dma-buf fencing and signaling between hardware drivers.
> > > > 
> > > > v2: Don't pretend the fences exist in an ordered timeline, but allocate
> > > > a separate fence-context for each fence so that the fences are
> > > > unordered.
> > > > v3: Make the debug output more interesting, and so the signaled status.
> > > > 
> > > > Testcase: igt/vgem_basic/dmabuf-fence
> > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > > > Cc: Sean Paul <seanpaul@xxxxxxxxxxxx>
> > > > Cc: Zach Reizner <zachr@xxxxxxxxxx>
> > > > Cc: Gustavo Padovan <gustavo.padovan@xxxxxxxxxxxxxxx>
> > > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> > > > Acked-by: Zach Reizner <zachr@xxxxxxxxxx>
> > > 
> > > One thing I completely forgotten: This allows userspace to hang kernel
> > > drivers. i915 (and other gpu drivers) can recover using hangcheck, but
> > > dumber drivers (v4l, if that ever happens) probably never except such a
> > > case. We've had a similar discusion with the userspace fences exposed in
> > > sw_fence, and decided to move all those ioctl into debugfs. I think we
> > > should do the same for this vgem-based debugging of implicit sync. Sorry
> > > for realizing this this late.
> > 
> > One of the very tests I make is to ensure that we recover from such a
> > hang. I don't see the difference between this any of the other ways
> > userspace can shoot itself (and others) in the foot.
> 
> So one solution would be to make vgem fences automatically timeout (with
> a flag for root to override for the sake of testing hang detection).

The problem is other drivers. E.g. right now atomic helpers assume that
fences will signal, and can't recover if they don't. This is why drivers
where things might fail must have some recovery (hangcheck, timeout) to
make sure dma_fences always signal.

Imo not even root should be allowed to break this, since it could put
drivers into a non-recoverable state. I think this must be restricted to
something known-unsafe-don't-enable-on-production like debugfs.

Other solutions which I don't like:
- Everyone needs to be able to recover. Given how much effort it is to
  just keep i915 hangcheck in working order I think that's totally
  illusionary to assume. At least once world+dog (atomic, v4l, ...) all
  consume/produce fences, subsystems where the usual assumption holds that
  async ops complete.

- Really long timeouts are allowed for root in vgem. Could lead to even
  more fun in testing i915 hangchecks I think, so don't like that much
  either.

I think the best option is to just do the same as we've done for sw_fence,
and move it to debugfs. We could reuse the debugfs sw_fence interface to
create them (gives us more control as a bonus), and just have an ioctl to
attach fences to vgem (which could be unpriviledged).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel