Am 17.09.2018 um 11:33 schrieb zhoucm1: > [SNIP] >>> Â +static struct dma_fence * >>> +drm_syncobj_point_get(struct drm_syncobj *syncobj, u64 point, u64 >>> flags) >>> +{ >>> +Â Â Â struct dma_fence *fence; >>> +Â Â Â int ret = 0; >>> + >>> +Â Â Â if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) { >>> +Â Â Â Â Â Â Â ret = wait_event_timeout(syncobj->syncobj_timeline.wq, >>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â point <= syncobj->syncobj_timeline.signal_point, >>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â msecs_to_jiffies(10000)); /* wait 10s */ >> >> Either don't use a timeout or provide the timeout explicitely, but >> don't hard code it here. >> >> Additional to that we probably need an incorruptible wait. > Initially, I used interruptible wait, I found it sometimes often > return -ERESTARTSYS without waiting any second when I wrote unit test. > Any ideas for that? Yeah, that is perfectly normal. The application just received a signal and we need to abort the IOCTL ASAP. > still need to interruptible wait? Yes, that should certainly be an interruptible wait. Otherwise an application could block on this forever when somebody forgets to supply a signal point. >>> Â -Â Â Â for (i = 0; i < args->count_handles; i++) >>> -Â Â Â Â Â Â Â drm_syncobj_replace_fence(syncobjs[i], 0, NULL); >>> - >>> +Â Â Â for (i = 0; i < args->count_handles; i++) { >>> +Â Â Â Â Â Â Â if (syncobjs[i]->type == DRM_SYNCOBJ_TYPE_TIMELINE) { >>> +Â Â Â Â Â Â Â Â Â Â Â DRM_ERROR("timeline syncobj cannot reset!\n"); >> >> Why not? I mean that should still work or do I miss anything? > timeline semaphore spec doesn't require reset interface, it says the > timeline value only can be changed by signal operations. Yeah, but we don't care about the timeline spec in the kernel. Question is rather if that still makes sense to support that and as far as I can see it should be trivial to reinitialize the object. >>> +Â Â Â Â Â Â Â Â Â Â Â ret = -EINVAL; >>> +Â Â Â Â Â Â Â Â Â Â Â goto out; >>> +Â Â Â Â Â Â Â } >>> +Â Â Â Â Â Â Â drm_syncobj_timeline_fini(syncobjs[i], >>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &syncobjs[i]->syncobj_timeline); >>> +Â Â Â Â Â Â Â drm_syncobj_timeline_init(syncobjs[i], >>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &syncobjs[i]->syncobj_timeline); >>> +Â Â Â } >>> +out: >>> Â Â Â Â Â drm_syncobj_array_free(syncobjs, args->count_handles); >>> Â -Â Â Â return 0; >>> +Â Â Â return ret; >>> Â } >>> Â Â int >>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c >>> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c >>> index 0a8d2d64f380..579e91a5858b 100644 >>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c >>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c >>> @@ -2137,7 +2137,9 @@ await_fence_array(struct i915_execbuffer *eb, >>> Â Â Â Â Â Â Â Â Â if (!(flags & I915_EXEC_FENCE_WAIT)) >>> Â Â Â Â Â Â Â Â Â Â Â Â Â continue; >>> Â -Â Â Â Â Â Â Â fence = drm_syncobj_fence_get(syncobj); >>> +Â Â Â Â Â Â Â drm_syncobj_search_fence(syncobj, 0, >>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT, >> >> Not sure if we can make DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT the >> default here. >> >> Maybe better if drivers explicitly need to ask for it? > if you don't like the flag used in specific driver, I can wrap it to a > new function. Yeah, that is probably a good idea. >> >>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â &fence); >>> Â Â Â Â Â Â Â Â Â if (!fence) >>> Â Â Â Â Â Â Â Â Â Â Â Â Â return -EINVAL; >>> Â diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h >>> index 425432b85a87..9535521e6623 100644 >>> --- a/include/drm/drm_syncobj.h >>> +++ b/include/drm/drm_syncobj.h >>> @@ -30,6 +30,25 @@ >>> Â Â struct drm_syncobj_cb; >>> Â +enum drm_syncobj_type { >>> +Â Â Â DRM_SYNCOBJ_TYPE_NORMAL, >> >> Does anybody have a better suggestion for _TYPE_NORMAL? That sounds >> like timeline would be special. > How about _TYPE_INDIVIDUAL? Oh, really good idea. > btw: I encounter a problem after removing advanced wait pt, I debug it > two days, but still don't find reason, from log, it's timeount when > wait_event_timeout, that means it don't receive signal operation. Good question. Maybe a race condition? Have you tried to trace both threads to the trace buffer, e.g. use trace_printk? > "./deqp-vk -n dEQP-VK*semaphore*" will fail on > 'dEQP-VK.synchronization.basic.semaphore.chain' case, but can pass if > only run that one case like "./deqp-vk -n > dEQP-VK.synchronization.basic.semaphore.chain". > > Both old and my previous implementation can pass and have no this > problem. Can you reproduce that as well with the unit tests in libdrm? Christian.