Re: [RFC] drm/i915: Add sync framework support to execbuff IOCTL

John Harrison <John.C.Harrison@xxxxxxxxx> · Thu, 02 Jul 2015 16:43:12 +0100

On 02/07/2015 14:22, Chris Wilson wrote:
On Thu, Jul 02, 2015 at 02:01:56PM +0100, John Harrison wrote:
On 02/07/2015 12:54, Chris Wilson wrote:
On Thu, Jul 02, 2015 at 12:09:59PM +0100, John.C.Harrison@xxxxxxxxx wrote:
From: John Harrison <John.C.Harrison@xxxxxxxxx>

Various projects desire a mechanism for managing dependencies between
work items asynchronously. This can also include work items across
complete different and independent systems. For example, an
application wants to retreive a frame from a video in device,
using it for rendering on a GPU then send it to the video out device
for display all without having to stall waiting for completion along
the way. The sync framework allows this. It encapsulates
synchronisation events in file descriptors. The application can
request a sync point for the completion of each piece of work. Drivers
should also take sync points in with each new work request and not
schedule the work to start until the sync has been signalled.

This patch adds sync framework support to the exec buffer IOCTL. A
sync point can be passed in to stall execution of the batch buffer
until signalled. And a sync point can be returned after each batch
buffer submission which will be signalled upon that batch buffer's
completion.

At present, the input sync point is simply waited on synchronously
inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
this will be handled asynchronously inside the scheduler and the IOCTL
can return without having to wait.

Note also that the scheduler will re-order the execution of batch
buffers, e.g. because a batch buffer is stalled on a sync point and
cannot be submitted yet but other, independent, batch buffers are
being presented to the driver. This means that the timeline within the
sync points returned cannot be global to the engine. Instead they must
be kept per context per engine (the scheduler may not re-order batches
within a context). Hence the timeline cannot be based on the existing
seqno values but must be a new implementation.
But there is nothing preventing assignment of the sync value on
submission. Other than the debug .fence_value_str it's a private
implementation detail, and the interface is solely through the fd and
signalling.
No, it needs to be public from the moment of creation. The sync
framework API allows sync points to be combined together to create
fences that either merge multiple points on the same timeline or
amalgamate points across differing timelines. The merging part means
that the sync point must be capable of doing arithmetic comparisons
with other sync points from the instant it is returned to user land.
And those comparisons must not change in the future due to scheduler
re-ordering because by then it is too late to redo the test.
You know that's not documented at all. The only information userspace
gets is afaict

struct sync_pt_info {
	__u32   len;
	char    obj_name[32];
	char    driver_name[32];
	__s32   status;
	__u64   timestamp_ns;

	__u8    driver_data[0];
};

There is a merge operation done by combining two fence into a new one.
Merging is done by ordering the fences based on the context pointers and
then by sync_pt->fence.seqno, not the private sync value.

How does userspace try to order the fences other than as opaque fd? You
actually mean driver_data is undefined ABI...
Hmm, something looks confused. Way back when (i.e. the shipping Android 
tree), the 'private' seqno value was very definitely being exposed in 
various ways but it looks like it has actually been superseded by the 
seqno value inside the (new) fence object that is inside the sync_pt. 
Despite that, there is still a 'compare' callback in the timeline_ops 
for doing comparisons between sync_pts based on their private 
implementation specific seqno. Although this is marked as 'required' it 
is not actually called anywhere anymore! So it looks like we can drop 
the 'private' seqno value completely and just use the fence version.

In my defense, this code is all coming from the Android tree and the 
port to the nightly was done by someone else. I hadn't realised that 
nightly had changed quite so significantly.

  You could implement this as a secondary write to the HWS,
assigning the sync_value to the sync_pt on submission and
remove the request tracking, as when signalled you only need to compare
the sync_value against the timeline value in the HWS.

However, that equally applies to the existing request->seqno. That can
also be assigned on submission so that it always an ordered timeline, and
so can be used internally or externally.
One of the scheduler patches is to defer seqno assignment until
batch submission rather than do it at request creation (for
execbuffer requests). You still have a problem with pre-emption
though. A request that is pre-empted will get a new seqno assigned
when it is resubmitted so that the HWS page always sees ordered
values popping out. For internal requests, this is fine but for
external sync points that breaks the assumptions made by the
framework.
I fail to see how. Nothing in uapi/sync.h says anything about the order
of fences or gives any such guarantees. If the external callers only
have access through the fd, there is no restriction that the timeline
sync_pt->value must be set prior to submission.

As noted above, it looks like the new version of the sync framework now 
allocates its own context/value seqno pair internally at creation time. 
So all my comments really refer to that seqno inside the struct fence. 
It seems like a bad idea that we are trying to shoe horn another seqno 
value on top.

Back to the drawing board...

-Chris

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx