On Tue, Feb 27, 2024 at 08:11:22PM +0000, Catalin Marinas wrote: > On Wed, Feb 07, 2024 at 09:45:59AM +0000, Oliver Upton wrote: [...] > > Think of the precedent this would establish. What would stop > > implementers from, say, changing out our memcpy implementation into a > > a hundred different uarch-specific routines. That isn't maintainable, > > nor is it even testable as most folks don't have access to your > > hardware. > > I agree. FTR, I'm fine with uarch optimisations if (a) they don't > run-time patch the kernel binary, (b) don't affect the existing hardware > and (c) show significant gains on the targeted uarch in some meaningful > benchmarks (definitely not microbenchmark hammering a certain kernel > path). and (d) they have a minimal, maintainable code footprint :) > So, if one wants an optimisation, it better benefits the other > implementations or at least it doesn't make them worse. Now, we do have > hardware from mobiles to large enterprise systems, so at some point we > may have to make a call on different kernel behaviours, possibly even at > run-time. We already do this at build-time, e.g. CONFIG_NUMA where it > doesn't make much sense in a mobile (yet). But they should not be seen > as uarch specific tweaks, more like higher-level classes of > optimisations. Agreed. I think the way we handled this case is a great example of how these sort of things should go -- a general improvement to how the stage-2 MMU gets loaded on VHE systems, which ought to benefit other implementations too. Only if we can't extract a generalization should we even think about something implementation-specific, IMO. -- Thanks, Oliver