On Thu, Jun 27, 2019 at 7:20 PM Koenig, Christian <Christian.Koenig@xxxxxxx> wrote: > > Am 27.06.19 um 19:10 schrieb Daniel Vetter: > > On Thu, Jun 27, 2019 at 03:48:06PM +0000, Koenig, Christian wrote: > >> Am 27.06.19 um 17:34 schrieb Daniel Vetter: > >>> On Thu, Jun 27, 2019 at 3:19 PM Christian König > >>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > >>>> Am 27.06.19 um 12:43 schrieb Daniel Vetter: > >>>>> On Thu, Jun 27, 2019 at 12:18 PM Christian König > >>>>> <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > >>>>>> While freeing up memory it is easier to remove a fence from a reservation > >>>>>> object instead of signaling it and installing a new one in all other objects. > >>>>>> > >>>>>> Clean this up by adding the removal function to the core reservation object > >>>>>> code instead of messing with the internal in amdgpu. > >>>>>> > >>>>>> No functional change. > >>>>>> > >>>>>> Signed-off-by: Christian König <christian.koenig@xxxxxxx> > >>>>>> --- > >>>>>> drivers/dma-buf/reservation.c | 62 +++++++++++++++++++ > >>>>>> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 +------------- > >>>>>> include/linux/reservation.h | 3 +- > >>>>>> 3 files changed, 65 insertions(+), 45 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c > >>>>>> index ef710effedfa..e43a316a005d 100644 > >>>>>> --- a/drivers/dma-buf/reservation.c > >>>>>> +++ b/drivers/dma-buf/reservation.c > >>>>>> @@ -231,6 +231,68 @@ void reservation_object_add_shared_fence(struct reservation_object *obj, > >>>>>> } > >>>>>> EXPORT_SYMBOL(reservation_object_add_shared_fence); > >>>>>> > >>>>>> +/** > >>>>>> + * reservation_object_remove_shared_fence - remove shared fences > >>>>>> + * @obj: the reservation object > >>>>>> + * @context: the context of the fences to remove > >>>>>> + * > >>>>>> + * Remove shared fences based on their fence context. > >>>>>> + */ > >>>>> This needs a serious explanation of "why?". Removing fences without > >>>>> guaranteeing they're all signaled is a good way to create havoc. Your > >>>>> commit message has a few words about what you're doing here, but it > >>>>> still doesn't explain why this is safe and when exactly it should be > >>>>> used. > >>>> Yeah, I'm not very keen about this either. > >>>> > >>>> The key point is the memory is not accessible by the hardware any more > >>>> because it is freed and removed from the page tables. > >>>> > >>>> So further access is prevented and in this special case it is actually > >>>> valid to do this even if the operation represented by the fence is still > >>>> ongoing. > >>> Hm, why do you have to remove these fences then? Can't you just let > >>> them signal and get collected as usual? As soon as you share buffers > >>> these fences can get anywhere, so you need to correctly unwind them no > >>> matter what. > >>> > >>> You're kinda still just describing what you're doing, not why. > >> It is simply more efficient to remove the fence from one reservation > >> object than to add a new fence to all other reservation objects in the > >> same process. > > Ok, you still talk in riddles and don't explain what the goal of this > > entire dance is, so I went and read the code. Assuming I didn't misread > > too much: > > > > 1. you create a fake fence on a per-process timeline. > > 2. you attach this liberally to all the bo you're creating on that > > process > > 3. the fence never signals on its own, but it has a very magic > > ->enable_signaling callback which is the only thing that makes this fence > > switch to signalled in a finite time. Before that it's stuck forever. So > > quite a bit a schrödinger fence: It's not a real fence (because it fails > > to signal in bounded time) except when you start to look at it. > > 4. Looking at the fence triggers eviction, at that point we replace this > > magic eviction fence with the next set, reacquire buffers and then unblock > > the kfd process once everything is in shape again. > > > > This is soooooooooooooooooo magic that I really don't think we should > > encourage people without clue to maybe use this and totally break all > > fences guarantees. > > Yeah, that is correct. But this is completely unrelated to why we want > to remove the fence. > > > If you do want to make sure an optimized version within > > reservation_object.c, then it should be code which replaces fences iff: > > - they're the same context > > - later in the ordering within that context > > - of the same type (i.e. safe vs shared) > > > > That would actually be safe thing to do. > > No, that won't work because there is no replacement for the fence in > question. > > See we want to remove the fence because the memory is freed up. > > > Also, the above is what I expected when asking "why do you need this", not > > "we replace fences, its more efficient" I kinda got that from the code :-) > > Well I explained the short version why we do this. What you dug up here > is correct as well, but completely unrelated to removing the fence. > > Again, the reason to remove the fence from one reservation object is > simply that it is faster to remove it from one object than to attach a > new fence to all other objects. Hm I guess I was lead astray by the eviction_fence invalidation thing in enable_signaling, and a few other places that freed the bo right afters (like amdgpu_amdkfd_gpuvm_free_memory_of_gpu), where removing the fences first and then freeing the bo is kinda pointless. Now with your insistence that I'm getting something wrong I guess the you're talking about the unbind case, where the bo survives, but it's mapping disappears, and hence that specific eviction_fence needs to be removed. And yeah there I guess just removing the magic eviction fence is cheaper than replacing all the others. Now I guess I understand the mechanics of this somewhat, and what you're doing, and lit ooks even somewhat safe. But I have no idea what this is supposed to achieve. It feels a bit like ->notify_move, but implemented in the most horrible way possible. Or maybe something else. Really no idea. And given that we've wasted I few pages full of paragraphs already on trying to explain what your new little helper is for, when it's safe to use, when it's maybe not a good idea, and we still haven't even bottomed out on what this is for ... well I really don't think it's a good idea to inflict this into core code. Because just blindly removing normal fences is not safe. Especially with like half a sentence of kerneldoc that explains nothing of all this complexity. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx