[PATCH 5/5] [RFC]drm: add syncobj timeline support v3

zhoucm1@xxxxxxx (zhoucm1) · Tue, 4 Sep 2018 12:04:48 +0800



On 2018å¹´09æ??03æ?¥ 19:19, Christian KÃ¶nig wrote:
> Am 03.09.2018 um 12:07 schrieb Chunming Zhou:
>>
>>
>> å?¨ 2018/9/3 16:50, Christian KÃ¶nig å??é??:
>>> Am 03.09.2018 um 06:13 schrieb Chunming Zhou:
>>>>
>>>>
>>>> å?¨ 2018/8/30 19:32, Christian KÃ¶nig å??é??:
>>>>> [SNIP]
>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> +struct drm_syncobj_wait_pt {
>>>>>>>>>> +Â Â Â  struct drm_syncobj_stub_fence base;
>>>>>>>>>> +Â Â Â  u64Â Â Â  value;
>>>>>>>>>> +Â Â Â  struct rb_nodeÂ Â  node;
>>>>>>>>>> +};
>>>>>>>>>> +struct drm_syncobj_signal_pt {
>>>>>>>>>> +Â Â Â  struct drm_syncobj_stub_fence base;
>>>>>>>>>> +Â Â Â  struct dma_fence *signal_fence;
>>>>>>>>>> +Â Â Â  struct dma_fence *pre_pt_base;
>>>>>>>>>> +Â Â Â  struct dma_fence_cb signal_cb;
>>>>>>>>>> +Â Â Â  struct dma_fence_cb pre_pt_cb;
>>>>>>>>>> +Â Â Â  struct drm_syncobj *syncobj;
>>>>>>>>>> +Â Â Â  u64Â Â Â  value;
>>>>>>>>>> +Â Â Â  struct list_head list;
>>>>>>>>>> +};
>>>>>>>>>
>>>>>>>>> What are those two structures good for
>>>>>>>> They are used to record wait ops points and signal ops points.
>>>>>>>> For timeline, they are connected by timeline value, works like:
>>>>>>>> Â Â Â  a. signal pt increase timeline value to signal_pt value 
>>>>>>>> when signal_pt is signaled. signal_pt is depending on previous 
>>>>>>>> pt fence and itself signal fence from signal ops.
>>>>>>>> Â Â Â  b. wait pt tree checks if timeline value over itself value.
>>>>>>>>
>>>>>>>> For normal, works like:
>>>>>>>> Â Â Â  a. signal pt increase 1 for syncobj_timeline->signal_point 
>>>>>>>> every time when signal ops is performed.
>>>>>>>> Â Â Â  b. when wait ops is comming, wait pt will fetch above last 
>>>>>>>> signal pt value as its wait point. wait pt will be only 
>>>>>>>> signaled by equal point value signal_pt.
>>>>>>>>
>>>>>>>>
>>>>>>>>> and why is the stub fence their base?
>>>>>>>> Good question, I tried to kzalloc for them as well when I debug 
>>>>>>>> them, I encountered a problem:
>>>>>>>> I lookup/find wait_pt or signal_pt successfully, but when I 
>>>>>>>> tried to use them, they are freed sometimes, and results in 
>>>>>>>> NULL point.
>>>>>>>> and generally, when lookup them, we often need their stub fence 
>>>>>>>> as well, and in the meantime,Â  their lifecycle are same.
>>>>>>>> Above reason, I make stub fence as their base.
>>>>>>>
>>>>>>> That sounds like you only did this because you messed up the 
>>>>>>> lifecycle.
>>>>>>>
>>>>>>> Additional to that I don't think you correctly considered the 
>>>>>>> lifecycle of the waits and the sync object itself. E.g. blocking 
>>>>>>> in drm_syncobj_timeline_fini() until all waits are done is not a 
>>>>>>> good idea.
>>>>>>>
>>>>>>> What you should do instead is to create a fence_array object 
>>>>>>> with all the fence we need to wait for when a wait point is 
>>>>>>> requested.
>>>>>> Yeah, this is our initial discussion result, but when I tried to 
>>>>>> do that, I found that cannot meet the advance signal requirement:
>>>>>> Â Â Â  a. We need to consider the wait and signal pt value are not 
>>>>>> one-to-one match, it's difficult to find dependent point, at 
>>>>>> least, there is some overhead.
>>>>>
>>>>> As far as I can see that is independent of using a fence array 
>>>>> here. See you can either use a ring buffer or an rb-tree, but when 
>>>>> you want to wait for a specific point we need to condense the not 
>>>>> yet signaled fences into an array.
>>>> again, need to find the range of where the specific point in, we 
>>>> should close to timeline semantics, I also refered the sw_sync.c 
>>>> timeline, usally wait_pt is signalled by timeline point. And I 
>>>> agree we can implement it with several methods, but I don't think 
>>>> there are basical differences.
>>>
>>> The problem is that with your current approach you need the sync_obj 
>>> alive for the synchronization to work. That is most likely not a 
>>> good idea.
>> Indeed, I will fix that. How abount only creating fence array for 
>> every wait pt when syncobj release? when syncobj release, wait pt 
>> must have waited the signal opertation, then we can easily condense 
>> fences for every wait pt. And meantime, we can take timeline based 
>> wait pt advantage.
>
> That could work, but I don't see how you want to replace the already 
> issued fence with a fence_array when the sync object is destroyed.
>
> Additional to that I would rather prefer a consistent handling, e.g. 
> without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline structure 
as a reference. This way syncobj itself can be released first, 
wait_pt/signal_pt don't need syncobj at all.
every wait_pt/signal_pt keep a reference of syncobj_timeline.

>
>>
>>>
>>> Additional to that you enable signaling without a need from the 
>>> waiting side. That is rather bad for implementations which need that 
>>> optimization.
>> Do you mean increasing timeline based on signal fence is not better? 
>> only update timeline value when requested by a wait pt?
>
> Yes, exactly.
>
>> This way, we will not update timeline value immidiately and cannot 
>> free signal pt immidiately, and we also need to consider it to CPU 
>> query and wait.
>
> That is actually the better coding style. We usually try to avoid 
> doing things in interrupt handlers as much as possible.
OK, I see your concern, how about to delay handling to a workqueue? this 
way, we only increase timeline value and wake up workqueue in fence cb, 
is that acceptable?


>
> How about this idea:
> 1. Each signaling point is a fence implementation with an rb node.
> 2. Each node keeps a reference to the last previously inserted node.
> 3. Each node is referenced by the sync object itself.
> 4. Before each signal/wait operation we do a garbage collection and 
> remove the first node from the tree as long as it is signaled.
>
> 5. When enable_signaling is requested for a node we cascade that to 
> the left using rb_prev.
> Â Â Â  This ensures that signaling is enabled for the current fence as 
> well as all previous fences.
>
> 6. A wait just looks into the tree for the signal point lower or equal 
> of the requested sequence number.
>
> 7. When the sync object is released we use 
> rbtree_postorder_for_each_entry_safe() and drop the extra reference to 
> each node, but never call rb_erase!
> Â Â Â  This way the rb_tree stays in memory, but without a root (e.g. the 
> sync object). It only destructs itself when the looked up references 
> to the nodes are dropped.
>
> Well that is quite a bunch of logic, but I think that should work fine.
Yeah, it could work, simple timeline reference also can solve 'free' 
problem.

Thanks,
David Zhou
>
> Regards,
> Christian.
>
>>
>> Thanks,
>> David Zhou
>>
>>>
>>> I suggest to either condense all the fences you need to wait for in 
>>> an array during the wait operation, or reference count the sync_obj 
>>> and only enable the signaling on the fences when requested by a wait.
>>>
>>>>
>>>>>
>>>>>> Â Â Â  b. because we allowed "wait-before-signal", if 
>>>>>> "wait-before-signal" happens, there isn't signal fence which can 
>>>>>> be used to create fence array.
>>>>>
>>>>> Well, again we DON'T support wait-before-signal here. I will 
>>>>> certainly NAK any implementation which tries to do this until we 
>>>>> haven't figured out all the resource management constraints and I 
>>>>> still don't see how we can do this.
>>>> yes, we don't support real wait-before-signal in patch now, just a 
>>>> fake wait-before-signal, which still wait on CS submission until 
>>>> signal operation coming through wait_event, which is the conclusion 
>>>> we disscussed before.
>>>
>>> Well in this case we should call it wait-for-signal and not 
>>> wait-before-signal :)
>>>
>>>>
>>>>>
>>>>>> So timeline value is good to resolve that.
>>>>>>
>>>>>>>
>>>>>>> Otherwise somebody can easily construct a situation where 
>>>>>>> timeline sync obj A waits on B and B waits on A.
>>>>>> Same situation also can happens with fence-array, we only can see 
>>>>>> this is a programming bug with incorrect use.
>>>>>
>>>>> No, fence-array is initialized only once with a static list of 
>>>>> fences. This way it is impossible to add the fence-array to itself 
>>>>> for example.
>>>>>
>>>>> E.g. you can't build circle dependencies with that.
>>>> we already wait for signal operation event, so never build circle 
>>>> dependencies with that. The theory is same.
>>>
>>> Yeah, ok that is certainly true.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>
>>>>>
>>>>>> Better use rbtree_postorder_for_each_entry_safe() for this.
>>>>>>>> From the comments, seems we cannot use it here, we need erase 
>>>>>>>> node here.
>>>>>>>> Comments:
>>>>>>>> Â * Note, however, that it cannot handle other modifications 
>>>>>>>> that re-order the
>>>>>>>> Â * rbtree it is iterating over. This includes calling 
>>>>>>>> rb_erase() on @pos, as
>>>>>>>> Â * rb_erase() may rebalance the tree, causing us to miss some 
>>>>>>>> nodes.
>>>>>>>> Â */
>>>>>>>
>>>>>>> Yeah, but your current implementation has the same problem :)
>>>>>>>
>>>>>>> You need to use something like "while (node = rb_first(...))" 
>>>>>>> instead if you want to destruct the whole tree.
>>>> OK, got it, I will change it in v4.
>>>>
>>>> Thanks,
>>>> David Zhou
>>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>