On Tue, Sep 24, 2024 at 01:41:31AM +0200, Danilo Krummrich wrote: > (adding dri-devel) > > On Mon, Sep 23, 2024 at 02:24:02PM +0000, Zeng, Oak wrote: > > > > This patch is an old one in my back log. I roughly remember I ran into > > > a situation where there were two duplicated VMAs covering > > > > Same virtual address range are kept in gpuvm's RB-tree. One VMA > > > was actually already destroyed. This further caused issues as > > > > The destroyed VMA was found during a GPUVM RB-tree walk. This > > > triggered me to look into the gpuvm merge split logic and end > > > > Up with this patch. This patch did fix that issue. > > > > > > That would indeed be a big issue. As Matt suggests, is there a > > > reproducer? > > > > > > Either way, adding merge support can't be the fix for this, we need a > > > separate > > > one, that's back-portable. > > > > > > > The discussion went on when you were away. See https://patchwork.freedesktop.org/patch/614941/?series=138835&rev=1 > > Yes, I'm aware. But I don't see how this is related to what I said above? > > > > > Matt and me agreed to implement a merge logic in gpuvm, but gpuvm need to check a driver cookie/callback to decide merge or not. > > We reached this conclusion based on some requirement from system allocator design. See more details in above link. > > > > Can you take a look and let us know whether you agree? > > Generally, I'm fine with that, one of my early versions of GPUVM had this. But I > dropped it because I don't saw an immediate benefit. > > From my old change log: > > "Remove merging of GPUVAs; the kernel has limited to none knowlege about > the semantics of mapping sequences. Hence, merging is purely speculative. > It seems that gaining a significant (or at least a measurable) performance > increase through merging is way more likely to happen when userspace is > responsible for merging mappings up to the next larger page size if > possible." > > If the pure number of GPUVA structures is a concern though, I think it's fair to > add it. So, feel to send a patch. > > It's probably also a good idea to double check with my old merge implementation > [1]. It's pretty easy to get this wrong. I'm not saying I got it right, but if > we both ended up with the same logic, it's a good indicator. :) > > However, this should *not* be a solution for an existing bug. Above you mention > a bug related to "duplicated VMAs covering the same virtual address range". This > is unrelated and must be fixed separately. Do you have a way to reproduce this? > I 100% agree with Daniele that if this bug, merging is not the solution. Merging is additional, optional, feature not a bug fix. AFIAK there is not a bug here as our CI or mesa would likely report a memory bug. A reproducer for this would be good if this exists. Matt > [1] https://lore.kernel.org/dri-devel/20230217134422.14116-6-dakr@xxxxxxxxxx/ > > > > > > Also, can we move this on DRI-devel please? > > > > Yes will do. > > > > Oak