Re: [PATCH 1/3] drm/suballoc: Introduce a generic suballocation manager

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Fri, 17 Feb 2023 14:51:43 +0100

On 2/17/23 14:18, Christian König wrote:
Am 17.02.23 um 14:10 schrieb Thomas Hellström:
[SNIP]

Any chance you could do a quick performance comparison? If not, 
anything against merging this without the amd / radeon changes 
until we can land a simpler allocator?

Only if you can stick the allocator inside Xe and not drm, cause 
this seems to be for a different use case than the allocators 
inside radeon/amdgpu.

Hmm. No It's allocating in a ring-like fashion as well.  Let me put 
together a unit test for benchmaking. I think it would be a failure 
for the community to end up with three separate suballocators doing 
the exact same thing for the same problem, really.

Well exactly that's the point. Those allocators aren't the same 
because they handle different problems.

The allocator in radeon is simpler because it only had to deal with 
a limited number of fence timelines. The one in amdgpu is a bit more 
complex because of the added complexity for more fence timelines.

We could take the one from amdgpu and use it for radeon and others 
as well, but the allocator proposed here doesn't even remotely 
matches the requirements.

But again, what *are* those missing requirements exactly? What is the 
pathological case you see for the current code?

Well very low CPU overhead and don't do anything in a callback.

Well, dma_fence_wait_any() will IIRC register callbacks on all affected 
fences, although admittedly there is no actual allocator processing in them.

From what I can tell the amdgpu suballocator introduces excessive 
complexity to coalesce waits for fences from the same contexts, 
whereas the present code just frees from the fence callback if the 
fence wasn't already signaled.

And this is exactly the design we had previously which we removed 
after Dave stumbled over tons of problems with it.

So is the worry that those problems have spilled over in this code then? 
It's been pretty extensively tested, or is it you should never really 
use dma-fence callbacks?

The fence signalling code that fires that callback is typcally always 
run anyway on scheduler fences.

The reason we had for not using the amdgpu suballocator as originally 
planned was that this complexity made it very hard for us to 
undertand it and to fix issues we had with it.

Well what are those problems? The idea is actually not that hardware 
to understand.

We hit memory corruption, and we spent substantially more time trying to 
debug it than to put together this patch, while never really 
understanding what  happened, nor why you don't see that with amdgpu.

We could simplify it massively for the cost of only waiting for the 
oldest fence if that helps.

Let me grab the latest version from amdgpu and give it a try again, but 
yes I think that to make it common code we'll need it simpler (and my 
personal wish would be to separate the allocator functionality a bit 
more from the fence waiting, which I guess should be OK if the fence 
waiting is vastly simplified).

/Thomas

Regards,
Christian.

Regards,

Thomas