Re: Question on UAPI for fences

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeah, right. Providing the fd to reassign to a fence would indeed reduce the create/close overhead.

But it would still be more overhead than for example a simple on demand growing ring buffer which then uses 64bit sequence numbers in userspace to refer to a fence in the kernel.

Apart from that I'm pretty sure that when we do the syncing completely in userspace we need more fences open at the same time than fds are available by default.

As long as our internal handle or sequence based fence are easily convertible to a fence fd I actually don't really see a problem with that. Going to hack that approach into my prototype and then we can see how bad the code looks after all.

Christian.

Am 14.09.2014 um 02:32 schrieb Marek Olšák:
BTW, we can recycle fences in userspace just like we recycle buffers.
That should make the create/close overhead non-existent.

Marek

On Sat, Sep 13, 2014 at 2:25 PM, Christian König
<christian.koenig@xxxxxxx> wrote:
Doing such combining and cleaning up fds as soon as they have been passed
on should keep each application's fd usage fairly small.
Yeah, but this is exactly what we wanted to avoid internally because of the
IOCTL overhead.

And thinking more about it for our driver internal use we will definitely
hit some limitations with the number of FDs in use and the overhead for
creating and closing them. With the execution model we target for the long
term we will need something like 10k fences per second or more.

How about this: We use an identifier per client for the fence internally and
when we need to somehow expose it to somebody else export it as sync point
fd. Very similar to how we currently have GEM handles internally and when we
need to expose them export a DMA_buf fd.

Regards,
Christian.

Am 12.09.2014 um 18:38 schrieb John Harrison:

On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian König wrote:
pass in a list of fences to wait for before beginning a command
submission.
The Android implementation has a mechanism for combining multiple sync
points into a brand new single sync pt. Thus APIs only ever need to take in
a single fd not a list of them. If the user wants an operation to wait for
multiple events to occur then it is up to them to request the combined
version first. They can then happily close the individual fds that have been
combined and only keep the one big one around. Indeed, even that fd can be
closed once it has been passed on to some other API.

Doing such combining and cleaning up fds as soon as they have been passed
on should keep each application's fd usage fairly small.


On 12/09/2014 17:08, Christian König wrote:
As Daniel said using fd is most likely the way we want to do it but this
remains vague.
Separating the discussion if it should be an fd or not. Using an fd
sounds fine to me in general, but I have some concerns as well.

For example what was the maximum number of opened FDs per process again?
Could that become a problem? etc...

Please comment,
Christian.

Am 12.09.2014 um 18:03 schrieb Jerome Glisse:
On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian König wrote:
Am 12.09.2014 um 17:48 schrieb Jerome Glisse:
On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian König wrote:
Am 12.09.2014 um 17:33 schrieb Jerome Glisse:
On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote:
On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse
<j.glisse@xxxxxxxxx> wrote:
On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote:
On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter <daniel@xxxxxxxx>
wrote:
On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote:
Hello everyone,

to allow concurrent buffer access by different engines beyond
the multiple
readers/single writer model that we currently use in radeon and
other
drivers we need some kind of synchonization object exposed to
userspace.

My initial patch set for this used (or rather abused) zero
sized GEM buffers
as fence handles. This is obviously isn't the best way of doing
this (to
much overhead, rather ugly etc...), Jerome commented on this
accordingly.

So what should a driver expose instead? Android sync points?
Something else?
I think actually exposing the struct fence objects as a fd,
using android
syncpts (or at least something compatible to it) is the way to
go. Problem
is that it's super-hard to get the android guys out of hiding
for this :(

Adding a bunch of people in the hopes that something sticks.
More people.
Just to re-iterate, exposing such thing while still using command
stream
ioctl that use implicit synchronization is a waste and you can
only get
the lowest common denominator which is implicit synchronization.
So i do
not see the point of such api if you are not also adding a new cs
ioctl
with explicit contract that it does not do any kind of
synchronization
(it could be almost the exact same code modulo the do not wait for
previous cmd to complete).
Our thinking was to allow explicit sync from a single process, but
implicitly sync between processes.
This is a BIG NAK if you are using the same ioctl as it would mean
you are
changing userspace API, well at least userspace expectation. Adding
a new
cs flag might do the trick but it should not be about inter-process,
or any
thing special, it's just implicit sync or no synchronization.
Converting
userspace is not that much of a big deal either, it can be broken
into
several step. Like mesa use explicit synchronization all time but
ddx use
implicit.
The thinking here is that we need to be backward compatible for
DRI2/3 and
support all kind of different use cases like old DDX and new Mesa, or
old
Mesa and new DDX etc...

So for my prototype if the kernel sees any access of a BO from two
different
clients it falls back to the old behavior of implicit synchronization
of
access to the same buffer object. That might not be the fastest
approach,
but is as far as I can see conservative and so should work under all
conditions.

Apart from that the planning so far was that we just hide this
feature
behind a couple of command submission flags and new chunks.
Just to reproduce IRC discussion, i think it's a lot simpler and not
that
complex. For explicit cs ioctl you do not wait for any previous fence
of
any of the buffer referenced in the cs ioctl, but you still associate
a
new fence with all the buffer object referenced in the cs ioctl. So if
the
next ioctl is an implicit sync ioctl it will wait properly and
synchronize
properly with previous explicit cs ioctl. Hence you can easily have a
mix
in userspace thing is you only get benefit once enough of your
userspace
is using explicit.
Yes, that's exactly what my patches currently implement.

The only difference is that by current planning I implemented it as a
per BO
flag for the command submission, but that was just for testing. Having
a
single flag to switch between implicit and explicit synchronization for
whole CS IOCTL would do equally well.
Doing it per BO sounds bogus to me. But otherwise yes we are in
agreement.
As Daniel said using fd is most likely the way we want to do it but this
remains vague.

Note that you still need a way to have explicit cs ioctl to wait on a
previos "explicit" fence so you need some api to expose fence per cs
submission.
Exactly, that's what this mail thread is all about.

As Daniel correctly noted you need something like a functionality to
get a
fence as the result of a command submission as well as pass in a list
of
fences to wait for before beginning a command submission.

At least it looks like we are all on the same general line here, its
just
nobody has a good idea how the details should look like.

Regards,
Christian.

Cheers,
Jérôme

Regards,
Christian.

Cheers,
Jérôme

Alex

Also one thing that the Android sync point does not have, AFAICT,
is a
way to schedule synchronization as part of a cs ioctl so cpu never
have
to be involve for cmd stream that deal only one gpu (assuming the
driver
and hw can do such trick).

Cheers,
Jérôme

-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux