Re: [git pull] drm fixes for 6.11-rc6

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Tue, 03 Sep 2024 11:26:20 +0200

On Mon, 2024-09-02 at 12:33 +0200, Christian König wrote:
> Am 02.09.24 um 11:32 schrieb Thomas Hellström:
> > On Mon, 2024-09-02 at 08:13 +1000, Dave Airlie wrote:
> > > On Fri, 30 Aug 2024 at 12:32, Linus Torvalds
> > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > > On Fri, 30 Aug 2024 at 14:08, Dave Airlie <airlied@xxxxxxxxx>
> > > > wrote:
> > > > > The TTM revert is due to some stuttering graphical apps
> > > > > probably
> > > > > due
> > > > > to longer stalls while prefaulting.
> > > > Yeah, trying to pre-fault a PMD worth of pages in one go is
> > > > just
> > > > crazy talk.
> > > > 
> > > > Now, if it was PMD-aligned and you faulted in a single PMD,
> > > > that
> > > > would
> > > > be different. But just doing prn_insert_page() in a loop is
> > > > insane.
> > > > 
> > > > The code doesn't even stop when it hits a page that already
> > > > existed,
> > > > and it keeps locking and unlocking the last-level page table
> > > > over
> > > > and
> > > > over again.
> > > > 
> > > > Honestly, that code is questionable even for the *small* value,
> > > > much
> > > > less the "a PMD size" case.
> > > > 
> > > > Now, if you have an array of 'struct page *", you can use
> > > > vm_insert_pages(), and that's reasonably efficient.
> > > > 
> > > > And if you have a *contiguous* are of pfns, you can use
> > > > remap_pfn_range().
> > > > 
> > > > But that "insert one pfn at a time" that the drm layer does is
> > > > complete garbage. You're not speeding anything up, you're just
> > > > digging
> > > > deeper.
> > 
> > > I wonder if there is functionality that could be provided in a
> > > common
> > > helper, by the mm layers, or if there would be too many locking
> > > interactions to make it sane,
> > > 
> > > It seems too fraught with danger for drivers or subsystems to be
> > > just
> > > doing this in the simplest way that isn't actually that smart.
> > Hmm. I see even the "Don't error on prefaults" check was broken at
> > some
> > point :/.
> > 
> > There have been numerous ways to try to address this,
> > 
> > The remap_pfn_range was last tried, at least in the context of the
> > i915
> > driver IIRC by Christoph Hellwig but had to be ripped out since it
> > requires the mmap_lock in write mode. Here we have it only in read
> > mode.
> > 
> > Then there's the apply_to_page_range() used by the igfx
> > functionality
> > of the i915 driver. I don't think we should go that route without
> > turning it into something like vm_insert_pfns() with proper
> > checking.
> > This approach populates all entries of a buffer object.
> > 
> > Finally there's the huge fault attempt that had to be ripped out
> > due to
> > lack of pmd_special and pud_special flags and resulting clashes
> > with
> > gup_fast.
> > 
> > Perhaps a combination of the two latter if properly implemented
> > would
> > be the best choice.
> 
> I'm not deep enough into the memory management background to judge
> which 
> approach is the best, just one more data point to provide:
> 
> The pre-faulting was increased because of virtualization. When
> KVM/XEN 
> is mapping a BO into a guest the switching overhead for each fault is
> so 
> high that mapping a lot of PFNs at the same time becomes beneficial.

Since populating at mmap time is not possible due to eviction /
migration, perhaps one way would be to use madvise() to toggle
prefaulting size? MADV_RANDOM vs MADV_SEQUENTIAL vs MADV_NORMAL.

/Thomas

> 
> Regards,
> Christian.
> 
> > 
> > /Thomas
> > 
> > > Dave.
>