Am 04.09.2018 um 06:04 schrieb zhoucm1: > > > On 2018å¹´09æ??03æ?¥ 19:19, Christian König wrote: >> Am 03.09.2018 um 12:07 schrieb Chunming Zhou: >>> >>> >>> å?¨ 2018/9/3 16:50, Christian König å??é??: >>>> Am 03.09.2018 um 06:13 schrieb Chunming Zhou: >>>>> >>>>> >>>>> å?¨ 2018/8/30 19:32, Christian König å??é??: >>>>>> [SNIP] >>>>>>>>>> >>>>>>>>>>> + >>>>>>>>>>> +struct drm_syncobj_wait_pt { >>>>>>>>>>> +   struct drm_syncobj_stub_fence base; >>>>>>>>>>> +   u64   value; >>>>>>>>>>> +   struct rb_node  node; >>>>>>>>>>> +}; >>>>>>>>>>> +struct drm_syncobj_signal_pt { >>>>>>>>>>> +   struct drm_syncobj_stub_fence base; >>>>>>>>>>> +   struct dma_fence *signal_fence; >>>>>>>>>>> +   struct dma_fence *pre_pt_base; >>>>>>>>>>> +   struct dma_fence_cb signal_cb; >>>>>>>>>>> +   struct dma_fence_cb pre_pt_cb; >>>>>>>>>>> +   struct drm_syncobj *syncobj; >>>>>>>>>>> +   u64   value; >>>>>>>>>>> +   struct list_head list; >>>>>>>>>>> +}; >>>>>>>>>> >>>>>>>>>> What are those two structures good for >>>>>>>>> They are used to record wait ops points and signal ops points. >>>>>>>>> For timeline, they are connected by timeline value, works like: >>>>>>>>>    a. signal pt increase timeline value to signal_pt value >>>>>>>>> when signal_pt is signaled. signal_pt is depending on previous >>>>>>>>> pt fence and itself signal fence from signal ops. >>>>>>>>>    b. wait pt tree checks if timeline value over itself value. >>>>>>>>> >>>>>>>>> For normal, works like: >>>>>>>>>    a. signal pt increase 1 for syncobj_timeline->signal_point >>>>>>>>> every time when signal ops is performed. >>>>>>>>>    b. when wait ops is comming, wait pt will fetch above last >>>>>>>>> signal pt value as its wait point. wait pt will be only >>>>>>>>> signaled by equal point value signal_pt. >>>>>>>>> >>>>>>>>> >>>>>>>>>> and why is the stub fence their base? >>>>>>>>> Good question, I tried to kzalloc for them as well when I >>>>>>>>> debug them, I encountered a problem: >>>>>>>>> I lookup/find wait_pt or signal_pt successfully, but when I >>>>>>>>> tried to use them, they are freed sometimes, and results in >>>>>>>>> NULL point. >>>>>>>>> and generally, when lookup them, we often need their stub >>>>>>>>> fence as well, and in the meantime, their lifecycle are same. >>>>>>>>> Above reason, I make stub fence as their base. >>>>>>>> >>>>>>>> That sounds like you only did this because you messed up the >>>>>>>> lifecycle. >>>>>>>> >>>>>>>> Additional to that I don't think you correctly considered the >>>>>>>> lifecycle of the waits and the sync object itself. E.g. >>>>>>>> blocking in drm_syncobj_timeline_fini() until all waits are >>>>>>>> done is not a good idea. >>>>>>>> >>>>>>>> What you should do instead is to create a fence_array object >>>>>>>> with all the fence we need to wait for when a wait point is >>>>>>>> requested. >>>>>>> Yeah, this is our initial discussion result, but when I tried to >>>>>>> do that, I found that cannot meet the advance signal requirement: >>>>>>>    a. We need to consider the wait and signal pt value are not >>>>>>> one-to-one match, it's difficult to find dependent point, at >>>>>>> least, there is some overhead. >>>>>> >>>>>> As far as I can see that is independent of using a fence array >>>>>> here. See you can either use a ring buffer or an rb-tree, but >>>>>> when you want to wait for a specific point we need to condense >>>>>> the not yet signaled fences into an array. >>>>> again, need to find the range of where the specific point in, we >>>>> should close to timeline semantics, I also refered the sw_sync.c >>>>> timeline, usally wait_pt is signalled by timeline point. And I >>>>> agree we can implement it with several methods, but I don't think >>>>> there are basical differences. >>>> >>>> The problem is that with your current approach you need the >>>> sync_obj alive for the synchronization to work. That is most likely >>>> not a good idea. >>> Indeed, I will fix that. How abount only creating fence array for >>> every wait pt when syncobj release? when syncobj release, wait pt >>> must have waited the signal opertation, then we can easily condense >>> fences for every wait pt. And meantime, we can take timeline based >>> wait pt advantage. >> >> That could work, but I don't see how you want to replace the already >> issued fence with a fence_array when the sync object is destroyed. >> >> Additional to that I would rather prefer a consistent handling, e.g. >> without extra rarely used code paths. > Ah, I find a easy way, we just need to make syncobj_timeline structure > as a reference. This way syncobj itself can be released first, > wait_pt/signal_pt don't need syncobj at all. > every wait_pt/signal_pt keep a reference of syncobj_timeline. I've thought about that as well, but came to the conclusion that you run into problems because of circle dependencies. E.g. sync_obj references sync_point and sync_point references sync_obj. Additional to that it is quite a bit larger memory footprint because you need to keep the sync_obj around as well. > >> >>> >>>> >>>> Additional to that you enable signaling without a need from the >>>> waiting side. That is rather bad for implementations which need >>>> that optimization. >>> Do you mean increasing timeline based on signal fence is not better? >>> only update timeline value when requested by a wait pt? >> >> Yes, exactly. >> >>> This way, we will not update timeline value immidiately and cannot >>> free signal pt immidiately, and we also need to consider it to CPU >>> query and wait. >> >> That is actually the better coding style. We usually try to avoid >> doing things in interrupt handlers as much as possible. > OK, I see your concern, how about to delay handling to a workqueue? > this way, we only increase timeline value and wake up workqueue in > fence cb, is that acceptable? A little bit, but not much better. The nice thing with garbage collection is that the CPU time is accounted to the process causing the garbage. >> >> How about this idea: >> 1. Each signaling point is a fence implementation with an rb node. >> 2. Each node keeps a reference to the last previously inserted node. >> 3. Each node is referenced by the sync object itself. >> 4. Before each signal/wait operation we do a garbage collection and >> remove the first node from the tree as long as it is signaled. >> >> 5. When enable_signaling is requested for a node we cascade that to >> the left using rb_prev. >>    This ensures that signaling is enabled for the current fence as >> well as all previous fences. >> >> 6. A wait just looks into the tree for the signal point lower or >> equal of the requested sequence number. >> >> 7. When the sync object is released we use >> rbtree_postorder_for_each_entry_safe() and drop the extra reference >> to each node, but never call rb_erase! >>    This way the rb_tree stays in memory, but without a root (e.g. >> the sync object). It only destructs itself when the looked up >> references to the nodes are dropped. >> >> Well that is quite a bunch of logic, but I think that should work fine. > Yeah, it could work, simple timeline reference also can solve 'free' > problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. > > Thanks, > David Zhou