BTW, we can recycle fences in userspace just like we recycle buffers. That should make the create/close overhead non-existent. Marek On Sat, Sep 13, 2014 at 2:25 PM, Christian König <christian.koenig@xxxxxxx> wrote: >> Doing such combining and cleaning up fds as soon as they have been passed >> on should keep each application's fd usage fairly small. > > Yeah, but this is exactly what we wanted to avoid internally because of the > IOCTL overhead. > > And thinking more about it for our driver internal use we will definitely > hit some limitations with the number of FDs in use and the overhead for > creating and closing them. With the execution model we target for the long > term we will need something like 10k fences per second or more. > > How about this: We use an identifier per client for the fence internally and > when we need to somehow expose it to somebody else export it as sync point > fd. Very similar to how we currently have GEM handles internally and when we > need to expose them export a DMA_buf fd. > > Regards, > Christian. > > Am 12.09.2014 um 18:38 schrieb John Harrison: > >> On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian König wrote: >> > pass in a list of fences to wait for before beginning a command >> > submission. >> >> The Android implementation has a mechanism for combining multiple sync >> points into a brand new single sync pt. Thus APIs only ever need to take in >> a single fd not a list of them. If the user wants an operation to wait for >> multiple events to occur then it is up to them to request the combined >> version first. They can then happily close the individual fds that have been >> combined and only keep the one big one around. Indeed, even that fd can be >> closed once it has been passed on to some other API. >> >> Doing such combining and cleaning up fds as soon as they have been passed >> on should keep each application's fd usage fairly small. >> >> >> On 12/09/2014 17:08, Christian König wrote: >>>> >>>> As Daniel said using fd is most likely the way we want to do it but this >>>> remains vague. >>> >>> Separating the discussion if it should be an fd or not. Using an fd >>> sounds fine to me in general, but I have some concerns as well. >>> >>> For example what was the maximum number of opened FDs per process again? >>> Could that become a problem? etc... >>> >>> Please comment, >>> Christian. >>> >>> Am 12.09.2014 um 18:03 schrieb Jerome Glisse: >>>> >>>> On Fri, Sep 12, 2014 at 05:58:09PM +0200, Christian König wrote: >>>>> >>>>> Am 12.09.2014 um 17:48 schrieb Jerome Glisse: >>>>>> >>>>>> On Fri, Sep 12, 2014 at 05:42:57PM +0200, Christian König wrote: >>>>>>> >>>>>>> Am 12.09.2014 um 17:33 schrieb Jerome Glisse: >>>>>>>> >>>>>>>> On Fri, Sep 12, 2014 at 11:25:12AM -0400, Alex Deucher wrote: >>>>>>>>> >>>>>>>>> On Fri, Sep 12, 2014 at 10:50 AM, Jerome Glisse >>>>>>>>> <j.glisse@xxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> On Fri, Sep 12, 2014 at 04:43:44PM +0200, Daniel Vetter wrote: >>>>>>>>>>> >>>>>>>>>>> On Fri, Sep 12, 2014 at 4:09 PM, Daniel Vetter <daniel@xxxxxxxx> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 12, 2014 at 03:23:22PM +0200, Christian König wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> to allow concurrent buffer access by different engines beyond >>>>>>>>>>>>> the multiple >>>>>>>>>>>>> readers/single writer model that we currently use in radeon and >>>>>>>>>>>>> other >>>>>>>>>>>>> drivers we need some kind of synchonization object exposed to >>>>>>>>>>>>> userspace. >>>>>>>>>>>>> >>>>>>>>>>>>> My initial patch set for this used (or rather abused) zero >>>>>>>>>>>>> sized GEM buffers >>>>>>>>>>>>> as fence handles. This is obviously isn't the best way of doing >>>>>>>>>>>>> this (to >>>>>>>>>>>>> much overhead, rather ugly etc...), Jerome commented on this >>>>>>>>>>>>> accordingly. >>>>>>>>>>>>> >>>>>>>>>>>>> So what should a driver expose instead? Android sync points? >>>>>>>>>>>>> Something else? >>>>>>>>>>>> >>>>>>>>>>>> I think actually exposing the struct fence objects as a fd, >>>>>>>>>>>> using android >>>>>>>>>>>> syncpts (or at least something compatible to it) is the way to >>>>>>>>>>>> go. Problem >>>>>>>>>>>> is that it's super-hard to get the android guys out of hiding >>>>>>>>>>>> for this :( >>>>>>>>>>>> >>>>>>>>>>>> Adding a bunch of people in the hopes that something sticks. >>>>>>>>>>> >>>>>>>>>>> More people. >>>>>>>>>> >>>>>>>>>> Just to re-iterate, exposing such thing while still using command >>>>>>>>>> stream >>>>>>>>>> ioctl that use implicit synchronization is a waste and you can >>>>>>>>>> only get >>>>>>>>>> the lowest common denominator which is implicit synchronization. >>>>>>>>>> So i do >>>>>>>>>> not see the point of such api if you are not also adding a new cs >>>>>>>>>> ioctl >>>>>>>>>> with explicit contract that it does not do any kind of >>>>>>>>>> synchronization >>>>>>>>>> (it could be almost the exact same code modulo the do not wait for >>>>>>>>>> previous cmd to complete). >>>>>>>>> >>>>>>>>> Our thinking was to allow explicit sync from a single process, but >>>>>>>>> implicitly sync between processes. >>>>>>>> >>>>>>>> This is a BIG NAK if you are using the same ioctl as it would mean >>>>>>>> you are >>>>>>>> changing userspace API, well at least userspace expectation. Adding >>>>>>>> a new >>>>>>>> cs flag might do the trick but it should not be about inter-process, >>>>>>>> or any >>>>>>>> thing special, it's just implicit sync or no synchronization. >>>>>>>> Converting >>>>>>>> userspace is not that much of a big deal either, it can be broken >>>>>>>> into >>>>>>>> several step. Like mesa use explicit synchronization all time but >>>>>>>> ddx use >>>>>>>> implicit. >>>>>>> >>>>>>> The thinking here is that we need to be backward compatible for >>>>>>> DRI2/3 and >>>>>>> support all kind of different use cases like old DDX and new Mesa, or >>>>>>> old >>>>>>> Mesa and new DDX etc... >>>>>>> >>>>>>> So for my prototype if the kernel sees any access of a BO from two >>>>>>> different >>>>>>> clients it falls back to the old behavior of implicit synchronization >>>>>>> of >>>>>>> access to the same buffer object. That might not be the fastest >>>>>>> approach, >>>>>>> but is as far as I can see conservative and so should work under all >>>>>>> conditions. >>>>>>> >>>>>>> Apart from that the planning so far was that we just hide this >>>>>>> feature >>>>>>> behind a couple of command submission flags and new chunks. >>>>>> >>>>>> Just to reproduce IRC discussion, i think it's a lot simpler and not >>>>>> that >>>>>> complex. For explicit cs ioctl you do not wait for any previous fence >>>>>> of >>>>>> any of the buffer referenced in the cs ioctl, but you still associate >>>>>> a >>>>>> new fence with all the buffer object referenced in the cs ioctl. So if >>>>>> the >>>>>> next ioctl is an implicit sync ioctl it will wait properly and >>>>>> synchronize >>>>>> properly with previous explicit cs ioctl. Hence you can easily have a >>>>>> mix >>>>>> in userspace thing is you only get benefit once enough of your >>>>>> userspace >>>>>> is using explicit. >>>>> >>>>> Yes, that's exactly what my patches currently implement. >>>>> >>>>> The only difference is that by current planning I implemented it as a >>>>> per BO >>>>> flag for the command submission, but that was just for testing. Having >>>>> a >>>>> single flag to switch between implicit and explicit synchronization for >>>>> whole CS IOCTL would do equally well. >>>> >>>> Doing it per BO sounds bogus to me. But otherwise yes we are in >>>> agreement. >>>> As Daniel said using fd is most likely the way we want to do it but this >>>> remains vague. >>>> >>>>>> Note that you still need a way to have explicit cs ioctl to wait on a >>>>>> previos "explicit" fence so you need some api to expose fence per cs >>>>>> submission. >>>>> >>>>> Exactly, that's what this mail thread is all about. >>>>> >>>>> As Daniel correctly noted you need something like a functionality to >>>>> get a >>>>> fence as the result of a command submission as well as pass in a list >>>>> of >>>>> fences to wait for before beginning a command submission. >>>>> >>>>> At least it looks like we are all on the same general line here, its >>>>> just >>>>> nobody has a good idea how the details should look like. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> Cheers, >>>>>> Jérôme >>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>>> Cheers, >>>>>>>> Jérôme >>>>>>>> >>>>>>>>> Alex >>>>>>>>> >>>>>>>>>> Also one thing that the Android sync point does not have, AFAICT, >>>>>>>>>> is a >>>>>>>>>> way to schedule synchronization as part of a cs ioctl so cpu never >>>>>>>>>> have >>>>>>>>>> to be involve for cmd stream that deal only one gpu (assuming the >>>>>>>>>> driver >>>>>>>>>> and hw can do such trick). >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Jérôme >>>>>>>>>> >>>>>>>>>>> -Daniel >>>>>>>>>>> -- >>>>>>>>>>> Daniel Vetter >>>>>>>>>>> Software Engineer, Intel Corporation >>>>>>>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> dri-devel mailing list >>>>>>>>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx >>>>>>>>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >>> > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel