Re: [PATCH] dma-buf/fence: make fence context 64 bit

Christian König <deathsimple@xxxxxxxxxxx> · Thu, 19 May 2016 14:21:43 +0200

Am 19.05.2016 um 14:07 schrieb Daniel Vetter:
On Thu, May 19, 2016 at 11:30 AM, Christian König
<deathsimple@xxxxxxxxxxx> wrote:
Am 19.05.2016 um 11:14 schrieb Daniel Vetter:
On Thu, May 19, 2016 at 11:00:36AM +0200, Christian König wrote:
From: Christian König <christian.koenig@xxxxxxx>

Fence contexts are created on the fly (for example) by the GPU scheduler
used
in the amdgpu driver as a result of an userspace request. Because of this
userspace could in theory force a wrap around of the 32bit context number
if it doesn't behave well.

Avoid this by increasing the context number to 64bits. This way even when
userspace manages to allocate a billion contexts per second it takes more
than 500 years for the context number to wrap around.

Signed-off-by: Christian König <christian.koenig@xxxxxxx>
Hm, I think it'd be nice to wrap this up into a real struct and then
manage them with some idr or whatever. For debugging we might also want to
keep track of all fences on a given timeline and similar things, so
there will be a need for this in the future.

So if you go through every driver I think it's better to replace the type
with struct fence_context *context while we're at it. Makes it a notch
bigger since we need to add a little bit of error handling to all callers
of fence_context_alloc.

Volunteered? ;-)

Well, that's exactly what I wanted to avoid. 64bit numbers are fast to
allocate and easy to compare.

If I make it a structure then we would need to kmalloc() it and make sure it
is reference counted so it stays alive as long as any fence structure is
alive which is referring to it.

The overhead sounds to much to me, especially since we currently don't have
a real use for that right now.
Hm, I guess if you're worried about the kmalloc we could make
fence_context embeddable. At least I assume you have to allcate
something somewhere already to store the u64, and that something also
needs to be refcounted already (or cleaned up suitably) to make sure
it doesn't disappear before the fences go away.

Nope, while it's true that we allocate the fence context with the 
command submission context in an IOCTL for amdgpu freeing the command 
submission context happens while fences using the fence context are 
still around. E.g. the application could have dies, but some hardware 
operations are still under way referencing the context.

Keeping it as a structure can cause all kind of problems in OOM 
situations. For example we don't have a limit on the number of contexts 
created. E.g. an application could allocate a lot of command submissions 
context, submits a single waiting command and exits.

I'm just raising this because the longer we wait with redoing this
interface the more painful it'll be. Android at least does have a
full-blown struct, and the reason is exclusively for debugging. And
from what I've heard from android devs debugging fence lockups is a
complete pain. That's why I think sooner or later there's no way
around a full blown struct.
I actually don't think so. Having the context as a simple number gives 
much greater flexibility to us because we don't need to worry about any 
overhead.

For debugging you can just put the context number into a hashtable or 
tree if you need to store additional information to it.

Christian.

-Daniel

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel