On Mon, Jul 24, 2017 at 10:43:34AM -0700, Linus Torvalds wrote: > On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: > > I played with some clever changes such as limiting the copy to 48 bytes, > > disabling the memset and the like but I could not get a strong enough > > signal to say that any one change removed the extra or a clear part of > > it 20ns. > > What CPU did you use? Because the SMAP bit in particular matters. > > The field-by-field copies are extremely slow on modern CPU's that > implement SMAP, unless you also use the special "unsafe_put_user()" > code (or the nasty old put_user_ex() code that some of the x86 signal > code uses). > > So one of the advantages of just copy_to_user() ends up being visible > only on Broadwell+ (or whatever the SMAP cutoff is). Guys, could you take a look at vfs.git#work.siginfo? I'd been pretty much buried lately (and probably will for several more weeks - long-distance moves *suck*), so that thing got stalled, but it might be worth a look. The code generated in copy_siginfo_to_user() in it looks reasonably good, we don't copy more than we need and all copying to userland is done by copy_to_user() - one call per call of copy_siginfo_to_user(), so SMAP crap is not an issue. The next thing I hope to do is converting compat side of that thing to the same; that got stalled. Al "Buried in boxes" Viro...