On 2018å¹´09æ??03æ?¥ 19:19, Christian König wrote: > Am 03.09.2018 um 12:07 schrieb Chunming Zhou: >> >> >> å?¨ 2018/9/3 16:50, Christian König å??é??: >>> Am 03.09.2018 um 06:13 schrieb Chunming Zhou: >>>> >>>> >>>> å?¨ 2018/8/30 19:32, Christian König å??é??: >>>>> [SNIP] >>>>>>>>> >>>>>>>>>> + >>>>>>>>>> +struct drm_syncobj_wait_pt { >>>>>>>>>> +   struct drm_syncobj_stub_fence base; >>>>>>>>>> +   u64   value; >>>>>>>>>> +   struct rb_node  node; >>>>>>>>>> +}; >>>>>>>>>> +struct drm_syncobj_signal_pt { >>>>>>>>>> +   struct drm_syncobj_stub_fence base; >>>>>>>>>> +   struct dma_fence *signal_fence; >>>>>>>>>> +   struct dma_fence *pre_pt_base; >>>>>>>>>> +   struct dma_fence_cb signal_cb; >>>>>>>>>> +   struct dma_fence_cb pre_pt_cb; >>>>>>>>>> +   struct drm_syncobj *syncobj; >>>>>>>>>> +   u64   value; >>>>>>>>>> +   struct list_head list; >>>>>>>>>> +}; >>>>>>>>> >>>>>>>>> What are those two structures good for >>>>>>>> They are used to record wait ops points and signal ops points. >>>>>>>> For timeline, they are connected by timeline value, works like: >>>>>>>>    a. signal pt increase timeline value to signal_pt value >>>>>>>> when signal_pt is signaled. signal_pt is depending on previous >>>>>>>> pt fence and itself signal fence from signal ops. >>>>>>>>    b. wait pt tree checks if timeline value over itself value. >>>>>>>> >>>>>>>> For normal, works like: >>>>>>>>    a. signal pt increase 1 for syncobj_timeline->signal_point >>>>>>>> every time when signal ops is performed. >>>>>>>>    b. when wait ops is comming, wait pt will fetch above last >>>>>>>> signal pt value as its wait point. wait pt will be only >>>>>>>> signaled by equal point value signal_pt. >>>>>>>> >>>>>>>> >>>>>>>>> and why is the stub fence their base? >>>>>>>> Good question, I tried to kzalloc for them as well when I debug >>>>>>>> them, I encountered a problem: >>>>>>>> I lookup/find wait_pt or signal_pt successfully, but when I >>>>>>>> tried to use them, they are freed sometimes, and results in >>>>>>>> NULL point. >>>>>>>> and generally, when lookup them, we often need their stub fence >>>>>>>> as well, and in the meantime, their lifecycle are same. >>>>>>>> Above reason, I make stub fence as their base. >>>>>>> >>>>>>> That sounds like you only did this because you messed up the >>>>>>> lifecycle. >>>>>>> >>>>>>> Additional to that I don't think you correctly considered the >>>>>>> lifecycle of the waits and the sync object itself. E.g. blocking >>>>>>> in drm_syncobj_timeline_fini() until all waits are done is not a >>>>>>> good idea. >>>>>>> >>>>>>> What you should do instead is to create a fence_array object >>>>>>> with all the fence we need to wait for when a wait point is >>>>>>> requested. >>>>>> Yeah, this is our initial discussion result, but when I tried to >>>>>> do that, I found that cannot meet the advance signal requirement: >>>>>>    a. We need to consider the wait and signal pt value are not >>>>>> one-to-one match, it's difficult to find dependent point, at >>>>>> least, there is some overhead. >>>>> >>>>> As far as I can see that is independent of using a fence array >>>>> here. See you can either use a ring buffer or an rb-tree, but when >>>>> you want to wait for a specific point we need to condense the not >>>>> yet signaled fences into an array. >>>> again, need to find the range of where the specific point in, we >>>> should close to timeline semantics, I also refered the sw_sync.c >>>> timeline, usally wait_pt is signalled by timeline point. And I >>>> agree we can implement it with several methods, but I don't think >>>> there are basical differences. >>> >>> The problem is that with your current approach you need the sync_obj >>> alive for the synchronization to work. That is most likely not a >>> good idea. >> Indeed, I will fix that. How abount only creating fence array for >> every wait pt when syncobj release? when syncobj release, wait pt >> must have waited the signal opertation, then we can easily condense >> fences for every wait pt. And meantime, we can take timeline based >> wait pt advantage. > > That could work, but I don't see how you want to replace the already > issued fence with a fence_array when the sync object is destroyed. > > Additional to that I would rather prefer a consistent handling, e.g. > without extra rarely used code paths. Ah, I find a easy way, we just need to make syncobj_timeline structure as a reference. This way syncobj itself can be released first, wait_pt/signal_pt don't need syncobj at all. every wait_pt/signal_pt keep a reference of syncobj_timeline. > >> >>> >>> Additional to that you enable signaling without a need from the >>> waiting side. That is rather bad for implementations which need that >>> optimization. >> Do you mean increasing timeline based on signal fence is not better? >> only update timeline value when requested by a wait pt? > > Yes, exactly. > >> This way, we will not update timeline value immidiately and cannot >> free signal pt immidiately, and we also need to consider it to CPU >> query and wait. > > That is actually the better coding style. We usually try to avoid > doing things in interrupt handlers as much as possible. OK, I see your concern, how about to delay handling to a workqueue? this way, we only increase timeline value and wake up workqueue in fence cb, is that acceptable? > > How about this idea: > 1. Each signaling point is a fence implementation with an rb node. > 2. Each node keeps a reference to the last previously inserted node. > 3. Each node is referenced by the sync object itself. > 4. Before each signal/wait operation we do a garbage collection and > remove the first node from the tree as long as it is signaled. > > 5. When enable_signaling is requested for a node we cascade that to > the left using rb_prev. >    This ensures that signaling is enabled for the current fence as > well as all previous fences. > > 6. A wait just looks into the tree for the signal point lower or equal > of the requested sequence number. > > 7. When the sync object is released we use > rbtree_postorder_for_each_entry_safe() and drop the extra reference to > each node, but never call rb_erase! >    This way the rb_tree stays in memory, but without a root (e.g. the > sync object). It only destructs itself when the looked up references > to the nodes are dropped. > > Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. Thanks, David Zhou > > Regards, > Christian. > >> >> Thanks, >> David Zhou >> >>> >>> I suggest to either condense all the fences you need to wait for in >>> an array during the wait operation, or reference count the sync_obj >>> and only enable the signaling on the fences when requested by a wait. >>> >>>> >>>>> >>>>>>    b. because we allowed "wait-before-signal", if >>>>>> "wait-before-signal" happens, there isn't signal fence which can >>>>>> be used to create fence array. >>>>> >>>>> Well, again we DON'T support wait-before-signal here. I will >>>>> certainly NAK any implementation which tries to do this until we >>>>> haven't figured out all the resource management constraints and I >>>>> still don't see how we can do this. >>>> yes, we don't support real wait-before-signal in patch now, just a >>>> fake wait-before-signal, which still wait on CS submission until >>>> signal operation coming through wait_event, which is the conclusion >>>> we disscussed before. >>> >>> Well in this case we should call it wait-for-signal and not >>> wait-before-signal :) >>> >>>> >>>>> >>>>>> So timeline value is good to resolve that. >>>>>> >>>>>>> >>>>>>> Otherwise somebody can easily construct a situation where >>>>>>> timeline sync obj A waits on B and B waits on A. >>>>>> Same situation also can happens with fence-array, we only can see >>>>>> this is a programming bug with incorrect use. >>>>> >>>>> No, fence-array is initialized only once with a static list of >>>>> fences. This way it is impossible to add the fence-array to itself >>>>> for example. >>>>> >>>>> E.g. you can't build circle dependencies with that. >>>> we already wait for signal operation event, so never build circle >>>> dependencies with that. The theory is same. >>> >>> Yeah, ok that is certainly true. >>> >>> Regards, >>> Christian. >>> >>>> >>>> >>>>> >>>>>> Better use rbtree_postorder_for_each_entry_safe() for this. >>>>>>>> From the comments, seems we cannot use it here, we need erase >>>>>>>> node here. >>>>>>>> Comments: >>>>>>>>  * Note, however, that it cannot handle other modifications >>>>>>>> that re-order the >>>>>>>>  * rbtree it is iterating over. This includes calling >>>>>>>> rb_erase() on @pos, as >>>>>>>>  * rb_erase() may rebalance the tree, causing us to miss some >>>>>>>> nodes. >>>>>>>>  */ >>>>>>> >>>>>>> Yeah, but your current implementation has the same problem :) >>>>>>> >>>>>>> You need to use something like "while (node = rb_first(...))" >>>>>>> instead if you want to destruct the whole tree. >>>> OK, got it, I will change it in v4. >>>> >>>> Thanks, >>>> David Zhou >>> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >