On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > I played with some clever changes such as limiting the copy to 48 bytes, > disabling the memset and the like but I could not get a strong enough > signal to say that any one change removed the extra or a clear part of > it 20ns. What CPU did you use? Because the SMAP bit in particular matters. The field-by-field copies are extremely slow on modern CPU's that implement SMAP, unless you also use the special "unsafe_put_user()" code (or the nasty old put_user_ex() code that some of the x86 signal code uses). So one of the advantages of just copy_to_user() ends up being visible only on Broadwell+ (or whatever the SMAP cutoff is). Linus