Re: [PATCH RFC 102/111] staging: etnaviv: separate GPU pipes from execution state

Lucas Stach <l.stach@xxxxxxxxxxxxxx> · Wed, 08 Apr 2015 10:52:03 +0200

Am Dienstag, den 07.04.2015, 22:25 +0100 schrieb Russell King - ARM
Linux:
> On Tue, Apr 07, 2015 at 06:59:59PM +0200, Christian Gmeiner wrote:
> > Hi Lucas.
> > 
> > 2015-04-07 17:29 GMT+02:00 Lucas Stach <l.stach@xxxxxxxxxxxxxx>:
> > > And I don't get why each core needs to have a single device node. IMHO
> > > this is purely an implementation decision weather to have one device
> > > node for all cores or one device node per core.
> > 
> > It is an important decision. And I think that one device node per core
> > reflects the hardware design to 100%.
> 
> Since when do the interfaces to userspace need to reflect the hardware
> design?
> 
> Isn't the point of having a userspace interface, in part, to abstract
> the hardware design details and provide userspace with something that
> is relatively easy to use without needlessly exposing the variation
> of the underlying hardware?
> 
> Please get away from the idea that userspace interfaces should reflect
> the hardware design.
> 
> > What makes harder to get it right? The needed changes to the kernel
> > driver are not that hard. The user space is an other story but thats
> > because of the render-only thing, where we need to pass (prime)
> > buffers around and do fence syncs etc. In the end I do not see a
> > showstopper in the user space.
> 
> The fence syncs are an issue when you have multiple cores - that's
> something I started to sort out in my patch series, but when you
> appeared to refuse to accept some of the patches, I stopped...
> 
> The problem when you have multiple cores is one global fence event
> counter which gets compared to the fence values in each buffer
> object no longer works.
> 
> Consider this scenario:
> 
> You have two threads, thread A making use of a 2D core, and thread B
> using the 3D core.
> 
> Thread B submits a big long render operation, and the buffers get
> assigned fence number 1.
> 
> Thread A submits a short render operation, and the buffers get assigned
> fence number 2.
> 
> The 2D core finishes, and sends its interrupt.  Etnaviv updates the
> completed fence position to 2.
> 
> At this point, we believe that fence numbers 1 and 2 are now complete,
> despite the 3D core continuing to execute and operate on the buffers
> with fence number 1.
> 
> I'm certain that the fence implementation we currently have can't be
> made to work with multiple cores with a few tweeks - we need something
> better to cater for what is essentially out-of-order completion amongst
> the cores.
> 
> A simple resolution to that _would_ be your argument of exposing each
> GPU as a separate DRM node, because then we get completely separate
> accounting of each - but it needlessly adds an expense in userspace.
> Userspace would have to make multiple calls - to each GPU DRM node -
> to check whether the buffer is busy on any of the GPUs as it may not
> know which GPU could be using the buffer, especially if it got it via
> a dmabuf fd sent over the DRI3 protocol.  To me, that sounds like a
> burden on userspace.
> 
And even simpler would be to have the monotonic increasing fence queue
to be per core and allow each to GEM object to be on multiple queues. So
when waiting for buffer idle, the kernel can easily make sure the object
is idle on all attached fence queues.

Same principle as with the MMU mappings right now: a single GEM object
mapped to possibly different positions in the VM space of each core.

Regards,
Lucas
-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel