Re: [RFC] drm/i915: Add sync framework support to execbuff IOCTL

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 06 Jul 2015 16:21:28 +0100

On 07/06/2015 04:12 PM, Daniel Vetter wrote:
On Mon, Jul 06, 2015 at 03:46:49PM +0100, Tvrtko Ursulin wrote:
On 07/06/2015 03:26 PM, John Harrison wrote:
On 06/07/2015 14:59, Daniel Vetter wrote:
On Mon, Jul 06, 2015 at 01:58:25PM +0100, John Harrison wrote:
On 06/07/2015 10:29, Daniel Vetter wrote:
On Fri, Jul 03, 2015 at 12:17:33PM +0100, Tvrtko Ursulin wrote:
On 07/02/2015 04:55 PM, Chris Wilson wrote:
It would be nice if we could reuse one seqno both for
internal/external
fences. If you need to expose a fence ordering within a timeline
that is
based on the creation stamp rather than execution stamp, it seems
like
we could just add such a stamp when creating the sync_pt and not
worry
about its relationship to the execution seqno.

Doing so does expose that requests are reordered to userspace
since the
signalling timeline is not the same as userspace's ordered
timeline. Not
sure if that is a problem or not.

Afaict the sync uapi is based on waiting for all of a set of
fences to
retire. It doesn't seem to rely on fence ordering (that is knowing
that
fence A will signal before fence B so it need only wait on fence B).

Here's hoping that we can have both simplicity and efficiency...
Jumping in with not even perfect understanding of everything here -
but
timeline business has always been confusing me. There is nothing in
the
uapi which needs it afaics and iirc there was some discussion at
the time
Jesse floated his patches that it can be removed. Based on that when I
squashed his patches and ported them on top of John's request to fence
conversion it ended up something like the below (manually edited a
bit to
be less noisy and some prep patches omitted):

This implements the ioctl based uapi and indeed seqnos are not
actually
used in waits. So is this insufficient for some reason? (Other that it
does not implement the input fence side of things.)
Yeah android syncpt on top of struct fence embedded int i915 request is
what I'd have expected.
The thing I'm not happy with in that plan is that it leaves the kernel
driver at the mercy of user land applications. If we return a fence
object
to user land via a file descriptor (or indeed any other mechanism)
then that
fence object must be locked until user land closes the file. If the
fence
object is the one embedded within our request structure then that
means user
land is effectively locking our request structure. Given that more
and more
stuff is being attached to the request, that could be a fair bit of
memory
tied up that we can do nothing about. E.g. if a rogue/buggy application
requests a fence be returned for every batch buffer submitted but never
closes them. Whereas, if we go the route of a separate fence object
specifically for user land then they can leak them like a sieve and
we won't
really care so much.
Userspace can exhaust kernel allocations, that's nothing new. And if we
keep it userspace simply needs to leak a few more fence fds than if
there's a bit more data attached to it.

The solution to this problem is to have a mem cgroup limit set. No
need to
complicate our kernel code.

There is still the extra complication that request unreferencing cannot
require any kind of mutex lock if we are allowing it to happen from
outside of the driver. That means the unreference callback must move the
request to a 'please clean me later' list, schedule a worker thread to
run, and thus do the clean up asynchronously.

For this particular issue my solution was to extend the sync_fence
constructor to take a mutex and store it inside the object. Then at
destruction time, which happens at sync_fd->f_ops->release() time, it is
just a matter of calling kref_put_mutex instead of kref_put.

Seemed to work under some quick testing but that is as much as I did back
then.

The problem is that it doesn't scale since everyone wants some other kind
of mutex to serialize the final kref_put. If something is supposed to be
cross-subsystem/driver (which is the case for fences) then we really can't
do that kind of leaky locking design. Imo we should have a kref_put_mutex
considered harmful sign somewhere ...

I get the argument about everything wanting to add their own not 
scaling, but don't tie with the leaky comment? Also mutex is a pretty 
standard thing, especially since kref_put_mutex. :D If you look at it 
from that angle it kind of just exposes to the super class what the base 
class can do.

If you have weak references somewhere and need to prevent the object from
disappearing untimely while chasing that weak reference then imo the
better design pattern is to use kref_get_unless_zero. If you need the
serialization the mutex provides for some other reason (someone is only
hodling the mutex instead of grabbing a proper refernce when they really
should grab one) then your refcounting scheme probably needs another kind
of fixup patch.

I don't see how weak references can work since if the request goes 
information is lost, unless stored somewhere else.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx