On 2018å¹´09æ??04æ?¥ 15:00, Christian König wrote: > Am 04.09.2018 um 06:04 schrieb zhoucm1: >> >> >> On 2018å¹´09æ??03æ?¥ 19:19, Christian König wrote: >>> Am 03.09.2018 um 12:07 schrieb Chunming Zhou: >>>> >>>> >>>> å?¨ 2018/9/3 16:50, Christian König å??é??: >>>>> Am 03.09.2018 um 06:13 schrieb Chunming Zhou: >>>>>> >>>>>> >>>>>> å?¨ 2018/8/30 19:32, Christian König å??é??: >>>>>>> [SNIP] >>>>>>>>>>> >>>>>>>>>>>> + >>>>>>>>>>>> +struct drm_syncobj_wait_pt { >>>>>>>>>>>> +   struct drm_syncobj_stub_fence base; >>>>>>>>>>>> +   u64   value; >>>>>>>>>>>> +   struct rb_node  node; >>>>>>>>>>>> +}; >>>>>>>>>>>> +struct drm_syncobj_signal_pt { >>>>>>>>>>>> +   struct drm_syncobj_stub_fence base; >>>>>>>>>>>> +   struct dma_fence *signal_fence; >>>>>>>>>>>> +   struct dma_fence *pre_pt_base; >>>>>>>>>>>> +   struct dma_fence_cb signal_cb; >>>>>>>>>>>> +   struct dma_fence_cb pre_pt_cb; >>>>>>>>>>>> +   struct drm_syncobj *syncobj; >>>>>>>>>>>> +   u64   value; >>>>>>>>>>>> +   struct list_head list; >>>>>>>>>>>> +}; >>>>>>>>>>> >>>>>>>>>>> What are those two structures good for >>>>>>>>>> They are used to record wait ops points and signal ops points. >>>>>>>>>> For timeline, they are connected by timeline value, works like: >>>>>>>>>>    a. signal pt increase timeline value to signal_pt value >>>>>>>>>> when signal_pt is signaled. signal_pt is depending on >>>>>>>>>> previous pt fence and itself signal fence from signal ops. >>>>>>>>>>    b. wait pt tree checks if timeline value over itself value. >>>>>>>>>> >>>>>>>>>> For normal, works like: >>>>>>>>>>    a. signal pt increase 1 for >>>>>>>>>> syncobj_timeline->signal_point every time when signal ops is >>>>>>>>>> performed. >>>>>>>>>>    b. when wait ops is comming, wait pt will fetch above >>>>>>>>>> last signal pt value as its wait point. wait pt will be only >>>>>>>>>> signaled by equal point value signal_pt. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> and why is the stub fence their base? >>>>>>>>>> Good question, I tried to kzalloc for them as well when I >>>>>>>>>> debug them, I encountered a problem: >>>>>>>>>> I lookup/find wait_pt or signal_pt successfully, but when I >>>>>>>>>> tried to use them, they are freed sometimes, and results in >>>>>>>>>> NULL point. >>>>>>>>>> and generally, when lookup them, we often need their stub >>>>>>>>>> fence as well, and in the meantime, their lifecycle are same. >>>>>>>>>> Above reason, I make stub fence as their base. >>>>>>>>> >>>>>>>>> That sounds like you only did this because you messed up the >>>>>>>>> lifecycle. >>>>>>>>> >>>>>>>>> Additional to that I don't think you correctly considered the >>>>>>>>> lifecycle of the waits and the sync object itself. E.g. >>>>>>>>> blocking in drm_syncobj_timeline_fini() until all waits are >>>>>>>>> done is not a good idea. >>>>>>>>> >>>>>>>>> What you should do instead is to create a fence_array object >>>>>>>>> with all the fence we need to wait for when a wait point is >>>>>>>>> requested. >>>>>>>> Yeah, this is our initial discussion result, but when I tried >>>>>>>> to do that, I found that cannot meet the advance signal >>>>>>>> requirement: >>>>>>>>    a. We need to consider the wait and signal pt value are not >>>>>>>> one-to-one match, it's difficult to find dependent point, at >>>>>>>> least, there is some overhead. >>>>>>> >>>>>>> As far as I can see that is independent of using a fence array >>>>>>> here. See you can either use a ring buffer or an rb-tree, but >>>>>>> when you want to wait for a specific point we need to condense >>>>>>> the not yet signaled fences into an array. >>>>>> again, need to find the range of where the specific point in, we >>>>>> should close to timeline semantics, I also refered the sw_sync.c >>>>>> timeline, usally wait_pt is signalled by timeline point. And I >>>>>> agree we can implement it with several methods, but I don't think >>>>>> there are basical differences. >>>>> >>>>> The problem is that with your current approach you need the >>>>> sync_obj alive for the synchronization to work. That is most >>>>> likely not a good idea. >>>> Indeed, I will fix that. How abount only creating fence array for >>>> every wait pt when syncobj release? when syncobj release, wait pt >>>> must have waited the signal opertation, then we can easily condense >>>> fences for every wait pt. And meantime, we can take timeline based >>>> wait pt advantage. >>> >>> That could work, but I don't see how you want to replace the already >>> issued fence with a fence_array when the sync object is destroyed. >>> >>> Additional to that I would rather prefer a consistent handling, e.g. >>> without extra rarely used code paths. >> Ah, I find a easy way, we just need to make syncobj_timeline >> structure as a reference. This way syncobj itself can be released >> first, wait_pt/signal_pt don't need syncobj at all. >> every wait_pt/signal_pt keep a reference of syncobj_timeline. > > I've thought about that as well, but came to the conclusion that you > run into problems because of circle dependencies. > > E.g. sync_obj references sync_point and sync_point references sync_obj. sync_obj can be freed first, only sync point references syncobj_timeline structure, syncobj_timeline doesn't reference sync_pt, no circle dep. > > Additional to that it is quite a bit larger memory footprint because > you need to keep the sync_obj around as well. all signaled sync_pt are freed immediately except syncobj_timeline sturcture, where does extra memory foootprint take? > >> >>> >>>> >>>>> >>>>> Additional to that you enable signaling without a need from the >>>>> waiting side. That is rather bad for implementations which need >>>>> that optimization. >>>> Do you mean increasing timeline based on signal fence is not >>>> better? only update timeline value when requested by a wait pt? >>> >>> Yes, exactly. >>> >>>> This way, we will not update timeline value immidiately and cannot >>>> free signal pt immidiately, and we also need to consider it to CPU >>>> query and wait. >>> >>> That is actually the better coding style. We usually try to avoid >>> doing things in interrupt handlers as much as possible. >> OK, I see your concern, how about to delay handling to a workqueue? >> this way, we only increase timeline value and wake up workqueue in >> fence cb, is that acceptable? > > A little bit, but not much better. The nice thing with garbage > collection is that the CPU time is accounted to the process causing > the garbage. > >>> >>> How about this idea: >>> 1. Each signaling point is a fence implementation with an rb node. >>> 2. Each node keeps a reference to the last previously inserted node. >>> 3. Each node is referenced by the sync object itself. >>> 4. Before each signal/wait operation we do a garbage collection and >>> remove the first node from the tree as long as it is signaled. >>> >>> 5. When enable_signaling is requested for a node we cascade that to >>> the left using rb_prev. >>>    This ensures that signaling is enabled for the current fence as >>> well as all previous fences. >>> >>> 6. A wait just looks into the tree for the signal point lower or >>> equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. >>> >>> 7. When the sync object is released we use >>> rbtree_postorder_for_each_entry_safe() and drop the extra reference >>> to each node, but never call rb_erase! >>>    This way the rb_tree stays in memory, but without a root (e.g. >>> the sync object). It only destructs itself when the looked up >>> references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. Regards, David Zhou >>> >>> Well that is quite a bunch of logic, but I think that should work fine. >> Yeah, it could work, simple timeline reference also can solve 'free' >> problem. > > I think this approach is still quite a bit better, > e.g. you don't run into circle dependency problems, it needs less > memory and each node has always the same size which means we can use a > kmem_cache for it. > > Regards, > Christian. > >> >> Thanks, >> David Zhou >