On Sun, 3 Sept 2023 at 14:48, Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > If measurements support it then this looks like a nice optimization. Well, it seems to work, but when I profile it to see if the end result looks reasonable, the profile data is swamped by the return mispredicts from CPU errata workarounds, and to a smaller degree by the clac/stac overhead of SMAP. So it does seem to work - at least it boots here and everything looks normal - and it does seem to generate good code, but the profiles look kind of sad. I also note that we do a lot of stupid pointless 'statx' work that is then entirely thrown away for a regular stat() system call. Part of it is actual extra work to set the statx fields. But a lot of it is that even if we didn't do that, the 'statx' code has made 'struct kstat' much bigger, and made our code footprints much worse. Of course, even without the useless statx overhead, 'struct kstat' itself ends up having a lot of padding because of how 'struct timespec64' looks. It might actually be good to split it explicitly into seconds and nanoseconds just for padding. Because that all blows 'struct kstat' up to 160 bytes here. And to make it all worse, the statx code has caused all the filesystems to have their own 'getattr()' code just to fill in that worthless garbage, when it used to be that you could rely on 'generic_fillattr()'. I'm looking at ext4_getattr(), for example, and I think *all* of it is due to statx - that to a close approximation nobody cares about, and is a specialty system call for a couple of users And again - the indirect branches have gone from being "a cycle or two" to being pipeline stalls and mispredicts. So not using just a plain 'generic_fillattr()' is *expensive*. Sad. Because the *normal* stat() family of system calls are some of the most important ones out there. Very much unlike statx(). Linus