On Wed, Mar 19, 2025 at 8:39 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > Hi All, > > I know this is very last minute, but I was hoping that it might be possible to > squeeze in a session to discuss the following? > > Summary/Background: > > On arm64, physically contiguous and naturally aligned regions can take advantage > of contpte mappings (e.g. 64 KB) to reduce iTLB pressure. However, for file > regions containing text, current readahead behaviour often yields small, > misaligned folios, preventing this optimization. This proposal introduces a > special-case path for executable mappings, performing synchronous reads of an > architecture-chosen size into large folios (64 KB on arm64). Early performance > tests on real-world workloads (e.g. nginx, redis, kernel compilation) show ~2-9% > gains. AFAIK, MySQL is quite sensitive to iTLB pressure. It should be worth adding to the tests. > > I’ve previously posted attempts to enable this performance improvement ([1], > [2]), but there were objections and conversation fizzled out. Now that I have > more compelling performance data, I’m hoping there is now stronger > justification, and we can find a path forwards. > > What I’d Like to Cover: > > - Describe how text memory should ideally be mapped and why it benefits > performance. > > - Brief review of performance data. > > - Discuss options for the best way to encourage text into large folios: > - Let the architecture request a preferred size > - Extend VMA attributes to include preferred THP size hint > - Provide a sysfs knob > - Plug into the “mapping min folio order” infrastructure > - Other approaches? Did you try LBS? You can have 64K block size with LBS, it should create large folios for page cache so text should get large folios automatically (IIRC arm64 linker script has 64K alignment by default). Thanks, Yang > > [1] https://lore.kernel.org/all/20240215154059.2863126-1-ryan.roberts@xxxxxxx/ > [2] https://lore.kernel.org/all/20240717071257.4141363-1-ryan.roberts@xxxxxxx/ > > Thanks, > Ryan >