On Mon, Jun 28, 2021 at 09:13:29AM -0700, Thiago Macieira wrote: > On Monday, 28 June 2021 08:27:24 PDT Peter Zijlstra wrote: > > > That's what cpuid is for. With GCC function multi-versioning or equivalent > > > manually-rolled solutions, you can get exactly what you're asking for. > > > > Right, lots of self-modifying code solutions there, some of which can be > > linker driven, some not. In the kernel we use alternative() to replace > > short code sequences depending on CPUID. > > > > Userspace *could* do the same, rewriting code before first execution is > > fairly straight forward. > > Userspace shouldn't do SMC. It's bad enough that JITs without caching exist, > but having pure paged code is better. Pure pages are shared as needed by the > kernel. I don't feel that strongly; if SMC gets you measurable performance gains, go for it. If you're short on memory, buy more. > All you need is a simple bit test. You can then either branch to different > code paths or write to a function pointer so it'll go there directly the next > time. You can also choose to load different plugins depending on what CPU > features were found. Both bit tests and indirect function calls suffer the extra memory load, which is not free. > Consequence: CPU feature checking is done *very* early, often before main(). For the linker based ones, yes. IIRC the ifunc() attribute is particularly useful here.