Re: [PATCH] drm/gpuvm: merge adjacent gpuva range during a map operation

Matthew Brost <matthew.brost@xxxxxxxxx> · Tue, 24 Sep 2024 02:00:53 +0000

On Tue, Sep 24, 2024 at 01:41:31AM +0200, Danilo Krummrich wrote:
> (adding dri-devel)
> 
> On Mon, Sep 23, 2024 at 02:24:02PM +0000, Zeng, Oak wrote:
> > > > This patch is an old one in my back log. I roughly remember I ran into
> > > a situation where there were two duplicated VMAs covering
> > > > Same virtual address range are kept in gpuvm's RB-tree. One VMA
> > > was actually already destroyed. This further caused issues as
> > > > The destroyed VMA was found during a GPUVM RB-tree walk. This
> > > triggered me to look into the gpuvm merge split logic and end
> > > > Up with this patch. This patch did fix that issue.
> > > 
> > > That would indeed be a big issue. As Matt suggests, is there a
> > > reproducer?
> > > 
> > > Either way, adding merge support can't be the fix for this, we need a
> > > separate
> > > one, that's back-portable.
> > > 
> > 
> > The discussion went on when you were away. See https://patchwork.freedesktop.org/patch/614941/?series=138835&rev=1
> 
> Yes, I'm aware. But I don't see how this is related to what I said above?
> 
> > 
> > Matt and me agreed to implement a merge logic in gpuvm, but gpuvm need to check a driver cookie/callback to decide merge or not.
> > We reached this conclusion based on some requirement from system allocator design. See more details in above link.
> > 
> > Can you take a look and let us know whether you agree?
> 
> Generally, I'm fine with that, one of my early versions of GPUVM had this. But I
> dropped it because I don't saw an immediate benefit.
> 
> From my old change log:
> 
>     "Remove merging of GPUVAs; the kernel has limited to none knowlege about
>     the semantics of mapping sequences. Hence, merging is purely speculative.
>     It seems that gaining a significant (or at least a measurable) performance
>     increase through merging is way more likely to happen when userspace is
>     responsible for merging mappings up to the next larger page size if
>     possible."
> 
> If the pure number of GPUVA structures is a concern though, I think it's fair to
> add it. So, feel to send a patch.
> 
> It's probably also a good idea to double check with my old merge implementation
> [1]. It's pretty easy to get this wrong. I'm not saying I got it right, but if
> we both ended up with the same logic, it's a good indicator. :)
> 
> However, this should *not* be a solution for an existing bug. Above you mention
> a bug related to "duplicated VMAs covering the same virtual address range". This
> is unrelated and must be fixed separately. Do you have a way to reproduce this?
> 

I 100% agree with Daniele that if this bug, merging is not the solution.
Merging is additional, optional, feature not a bug fix.

AFIAK there is not a bug here as our CI or mesa would likely report a
memory bug. A reproducer for this would be good if this exists.

Matt

> [1] https://lore.kernel.org/dri-devel/20230217134422.14116-6-dakr@xxxxxxxxxx/
> 
> > 
> > > Also, can we move this on DRI-devel please?
> > 
> > Yes will do.
> > 
> > Oak