On Mon, Jul 06, 2015 at 04:21:28PM +0100, Tvrtko Ursulin wrote: > > On 07/06/2015 04:12 PM, Daniel Vetter wrote: > >On Mon, Jul 06, 2015 at 03:46:49PM +0100, Tvrtko Ursulin wrote: > >>On 07/06/2015 03:26 PM, John Harrison wrote: > >>>On 06/07/2015 14:59, Daniel Vetter wrote: > >>>>On Mon, Jul 06, 2015 at 01:58:25PM +0100, John Harrison wrote: > >>>>>On 06/07/2015 10:29, Daniel Vetter wrote: > >>>>>>On Fri, Jul 03, 2015 at 12:17:33PM +0100, Tvrtko Ursulin wrote: > >>>>>>>On 07/02/2015 04:55 PM, Chris Wilson wrote: > >>>>>>>>It would be nice if we could reuse one seqno both for > >>>>>>>>internal/external > >>>>>>>>fences. If you need to expose a fence ordering within a timeline > >>>>>>>>that is > >>>>>>>>based on the creation stamp rather than execution stamp, it seems > >>>>>>>>like > >>>>>>>>we could just add such a stamp when creating the sync_pt and not > >>>>>>>>worry > >>>>>>>>about its relationship to the execution seqno. > >>>>>>>> > >>>>>>>>Doing so does expose that requests are reordered to userspace > >>>>>>>>since the > >>>>>>>>signalling timeline is not the same as userspace's ordered > >>>>>>>>timeline. Not > >>>>>>>>sure if that is a problem or not. > >>>>>>>> > >>>>>>>>Afaict the sync uapi is based on waiting for all of a set of > >>>>>>>>fences to > >>>>>>>>retire. It doesn't seem to rely on fence ordering (that is knowing > >>>>>>>>that > >>>>>>>>fence A will signal before fence B so it need only wait on fence B). > >>>>>>>> > >>>>>>>>Here's hoping that we can have both simplicity and efficiency... > >>>>>>>Jumping in with not even perfect understanding of everything here - > >>>>>>>but > >>>>>>>timeline business has always been confusing me. There is nothing in > >>>>>>>the > >>>>>>>uapi which needs it afaics and iirc there was some discussion at > >>>>>>>the time > >>>>>>>Jesse floated his patches that it can be removed. Based on that when I > >>>>>>>squashed his patches and ported them on top of John's request to fence > >>>>>>>conversion it ended up something like the below (manually edited a > >>>>>>>bit to > >>>>>>>be less noisy and some prep patches omitted): > >>>>>>> > >>>>>>>This implements the ioctl based uapi and indeed seqnos are not > >>>>>>>actually > >>>>>>>used in waits. So is this insufficient for some reason? (Other that it > >>>>>>>does not implement the input fence side of things.) > >>>>>>Yeah android syncpt on top of struct fence embedded int i915 request is > >>>>>>what I'd have expected. > >>>>>The thing I'm not happy with in that plan is that it leaves the kernel > >>>>>driver at the mercy of user land applications. If we return a fence > >>>>>object > >>>>>to user land via a file descriptor (or indeed any other mechanism) > >>>>>then that > >>>>>fence object must be locked until user land closes the file. If the > >>>>>fence > >>>>>object is the one embedded within our request structure then that > >>>>>means user > >>>>>land is effectively locking our request structure. Given that more > >>>>>and more > >>>>>stuff is being attached to the request, that could be a fair bit of > >>>>>memory > >>>>>tied up that we can do nothing about. E.g. if a rogue/buggy application > >>>>>requests a fence be returned for every batch buffer submitted but never > >>>>>closes them. Whereas, if we go the route of a separate fence object > >>>>>specifically for user land then they can leak them like a sieve and > >>>>>we won't > >>>>>really care so much. > >>>>Userspace can exhaust kernel allocations, that's nothing new. And if we > >>>>keep it userspace simply needs to leak a few more fence fds than if > >>>>there's a bit more data attached to it. > >>>> > >>>>The solution to this problem is to have a mem cgroup limit set. No > >>>>need to > >>>>complicate our kernel code. > >>> > >>>There is still the extra complication that request unreferencing cannot > >>>require any kind of mutex lock if we are allowing it to happen from > >>>outside of the driver. That means the unreference callback must move the > >>>request to a 'please clean me later' list, schedule a worker thread to > >>>run, and thus do the clean up asynchronously. > >> > >>For this particular issue my solution was to extend the sync_fence > >>constructor to take a mutex and store it inside the object. Then at > >>destruction time, which happens at sync_fd->f_ops->release() time, it is > >>just a matter of calling kref_put_mutex instead of kref_put. > >> > >>Seemed to work under some quick testing but that is as much as I did back > >>then. > > > >The problem is that it doesn't scale since everyone wants some other kind > >of mutex to serialize the final kref_put. If something is supposed to be > >cross-subsystem/driver (which is the case for fences) then we really can't > >do that kind of leaky locking design. Imo we should have a kref_put_mutex > >considered harmful sign somewhere ... > > I get the argument about everything wanting to add their own not scaling, > but don't tie with the leaky comment? Also mutex is a pretty standard thing, > especially since kref_put_mutex. :D If you look at it from that angle it > kind of just exposes to the super class what the base class can do. leaky not as in leaking memory, but leaky abstraction - we impose locking internals for i915 onto users of the fence interface (somewhat at least). > > >If you have weak references somewhere and need to prevent the object from > >disappearing untimely while chasing that weak reference then imo the > >better design pattern is to use kref_get_unless_zero. If you need the > >serialization the mutex provides for some other reason (someone is only > >hodling the mutex instead of grabbing a proper refernce when they really > >should grab one) then your refcounting scheme probably needs another kind > >of fixup patch. > > I don't see how weak references can work since if the request goes > information is lost, unless stored somewhere else. So weak refernce is a pointer which doesn't hold a reference to the object controlled with kref, but protected by some lock. Latest when the object is destroyed we need to clear that pointer in the free callback of kref_put (and so also grab the lock that protects that pointer). The problem is that anyone else chasing that weak reference might race with the final kref_put and increase the refcount from 0 to 1, which isn't good. There's two ways to do that: - kref_put_mutex on the release side. - in the acquire side do a kref_get_unless_zero _while_ holding that lock. Two upsides of the later approach: - You can have an unrestricted amount of weak references, each protected with their own lock. kref_put_mutex only works up to one. - It doesn't serialize the final kref_put with the lock - doing that allows folks to rely on the mutex instead of a proper refcount to make the obj stick around, which is imo a design antipattern that I suffered through trying to cleaning it up agian a lot. But that's really a tangent to the discussion here, I have no idea whether this applies here since I didn't read your patch in detail. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx