On Mon, 2024-09-02 at 12:33 +0200, Christian König wrote: > Am 02.09.24 um 11:32 schrieb Thomas Hellström: > > On Mon, 2024-09-02 at 08:13 +1000, Dave Airlie wrote: > > > On Fri, 30 Aug 2024 at 12:32, Linus Torvalds > > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Fri, 30 Aug 2024 at 14:08, Dave Airlie <airlied@xxxxxxxxx> > > > > wrote: > > > > > The TTM revert is due to some stuttering graphical apps > > > > > probably > > > > > due > > > > > to longer stalls while prefaulting. > > > > Yeah, trying to pre-fault a PMD worth of pages in one go is > > > > just > > > > crazy talk. > > > > > > > > Now, if it was PMD-aligned and you faulted in a single PMD, > > > > that > > > > would > > > > be different. But just doing prn_insert_page() in a loop is > > > > insane. > > > > > > > > The code doesn't even stop when it hits a page that already > > > > existed, > > > > and it keeps locking and unlocking the last-level page table > > > > over > > > > and > > > > over again. > > > > > > > > Honestly, that code is questionable even for the *small* value, > > > > much > > > > less the "a PMD size" case. > > > > > > > > Now, if you have an array of 'struct page *", you can use > > > > vm_insert_pages(), and that's reasonably efficient. > > > > > > > > And if you have a *contiguous* are of pfns, you can use > > > > remap_pfn_range(). > > > > > > > > But that "insert one pfn at a time" that the drm layer does is > > > > complete garbage. You're not speeding anything up, you're just > > > > digging > > > > deeper. > > > > > I wonder if there is functionality that could be provided in a > > > common > > > helper, by the mm layers, or if there would be too many locking > > > interactions to make it sane, > > > > > > It seems too fraught with danger for drivers or subsystems to be > > > just > > > doing this in the simplest way that isn't actually that smart. > > Hmm. I see even the "Don't error on prefaults" check was broken at > > some > > point :/. > > > > There have been numerous ways to try to address this, > > > > The remap_pfn_range was last tried, at least in the context of the > > i915 > > driver IIRC by Christoph Hellwig but had to be ripped out since it > > requires the mmap_lock in write mode. Here we have it only in read > > mode. > > > > Then there's the apply_to_page_range() used by the igfx > > functionality > > of the i915 driver. I don't think we should go that route without > > turning it into something like vm_insert_pfns() with proper > > checking. > > This approach populates all entries of a buffer object. > > > > Finally there's the huge fault attempt that had to be ripped out > > due to > > lack of pmd_special and pud_special flags and resulting clashes > > with > > gup_fast. > > > > Perhaps a combination of the two latter if properly implemented > > would > > be the best choice. > > I'm not deep enough into the memory management background to judge > which > approach is the best, just one more data point to provide: > > The pre-faulting was increased because of virtualization. When > KVM/XEN > is mapping a BO into a guest the switching overhead for each fault is > so > high that mapping a lot of PFNs at the same time becomes beneficial. Since populating at mmap time is not possible due to eviction / migration, perhaps one way would be to use madvise() to toggle prefaulting size? MADV_RANDOM vs MADV_SEQUENTIAL vs MADV_NORMAL. /Thomas > > Regards, > Christian. > > > > > /Thomas > > > > > Dave. >