Re: [PATCH 2/2] dma-buf: cleanup shared fence removal

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Fri, 28 Jun 2019 10:40:55 +0200

Am 28.06.19 um 09:30 schrieb Daniel Vetter:
On Fri, Jun 28, 2019 at 8:32 AM Koenig, Christian
<Christian.Koenig@xxxxxxx> wrote:
Am 27.06.19 um 21:57 schrieb Daniel Vetter:
[SNIP]
Again, the reason to remove the fence from one reservation object is
simply that it is faster to remove it from one object than to attach a
new fence to all other objects.
Hm I guess I was lead astray by the eviction_fence invalidation thing
in enable_signaling, and a few other places that freed the bo right
afters (like amdgpu_amdkfd_gpuvm_free_memory_of_gpu), where removing
the fences first and then freeing the bo is kinda pointless.
AH! Now I know where your missing puzzle piece is.

See when we free a buffer object TTM puts it on the delayed delete list
to make sure that it gets freed up only after all fences are signaled.

So this is essentially to make sure the BO gets freed up immediately.
Well yeah you have to wait for outstanding rendering. Everyone does
that. The problem is that ->eviction_fence does not represent
rendering, it's a very contrived way to implement the ->notify_move
callback.

For real fences you have to wait for rendering to complete, putting
aside corner cases like destroying an entire context with all its
pending rendering. Not really something we should optimize for I
think.

No, real fences I don't need to wait for the rendering to complete either.

If userspace said that this per process resource is dead and can be 
removed, we don't need to wait for it to become idle.

Now with your insistence that I'm getting something wrong I guess the
you're talking about the unbind case, where the bo survives, but it's
mapping disappears, and hence that specific eviction_fence needs to be
removed.
And yeah there I guess just removing the magic eviction fence is
cheaper than replacing all the others.
If possible I actually want to apply this to the general case of freeing
up per process resources.

In other words when we don't track resource usage on a per submission
basis freeing up resources is costly because we always have to wait for
the last submission.

But if we can prevent further access to the resource using page tables
it is perfectly valid to free it as soon as the page tables are up to date.
Still not seeing how you can use this outside of the magic amdkfd
eviction_fence.

As I explained you have a per process resource and userspace says that 
this one can go away.

As far as I can see it is perfectly valid to remove all fences from this 
process as soon as the page tables are up to date.

So with the magic amdkfd_eviction fence I agree this makes sense. The
problem I'm having here is that the magic eviction fence itself
doesn't make sense. What I expect will happen (in terms of the new
dynamic dma-buf stuff, I don't have the ttm-ism ready to explain it in
those concepts).

- when you submit command buffers, you _dont_ attach fences to all
involved buffers

That's not going to work because then the memory management then thinks 
that the buffer is immediately movable, which it isn't,

- when you get a ->notify_move there's currently 2 options:
1. inefficient way: wait for the latest command buffer to complete, as
a defensive move. To do that you attach the fence from that command
buffer to the obj in your notifiy_move callback (the kerneldoc doesn't
explain this, but I think we really need this).
2. efficient way: You just unmap from pagetables (and eat/handle the
fault if there is any).

Exactly yeah. As far as I can see for freeing things up that is a 
perfectly valid approach as long as you have a VM which prevents 
accessing this memory.

See ttm_bo_individualize_resv() as well, here we do something similar 
for GFX what the KFD does when it releases memory.

E.g. for per process resources we copy over the current fences into an 
individualized reservation object, to make sure that this can be freed 
up at some time in the future.

But I really want to go a step further and say ok, all fences from this 
process can be removed after we updated the page tables.

No where do you need to remove a fence, because you never attached a
bogus fence.

Now with the magic eviction trick amdkfd uses, you can't do that,
because you need to attach this magic fence to all buffers, all the
time. And since it still needs to work like a fence it's one-shot
only, i.e. instead of a reuseable ->notify_move callback you need to
create a new fence every time ->enable_signalling is called. So in a
way replacing fences is just an artifact of some very, very crazy
calling convention.

If you have a real callback, there's no need for cycling through
fences, and therefore there's also no need to optimize their removal.

Or did you work under the assumption that ->notify_move cannot attach
new fences, and therefore you'd have to roll this magic fence trick to
even more places?

Well that notify_move approach was what was initially suggested by the 
KFD team as well. The problem is simply that there is no general 
notify_move callback for buffers yet.

Adding that one is exactly what my patches for dynamic DMA-buf are 
currently doing :)

Now I guess I understand the mechanics of this somewhat, and what
you're doing, and lit ooks even somewhat safe. But I have no idea what
this is supposed to achieve. It feels a bit like ->notify_move, but
implemented in the most horrible way possible. Or maybe something
else.

Really no idea.

And given that we've wasted I few pages full of paragraphs already on
trying to explain what your new little helper is for, when it's safe
to use, when it's maybe not a good idea, and we still haven't even
bottomed out on what this is for ... well I really don't think it's a
good idea to inflict this into core code. Because just blindly
removing normal fences is not safe.

Especially with like half a sentence of kerneldoc that explains
nothing of all this complexity.
Still makes no sense to me to have in core code.

Well it is still better to have this in the core code than messing with 
the reservation object internals in the driver code.

Regards,
Christian.

-Daniel

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx