Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Sat, Jul 22, 2017 at 1:25 PM, Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: >> I played with some clever changes such as limiting the copy to 48 bytes, >> disabling the memset and the like but I could not get a strong enough >> signal to say that any one change removed the extra or a clear part of >> it 20ns. > > What CPU did you use? Because the SMAP bit in particular matters. > > The field-by-field copies are extremely slow on modern CPU's that > implement SMAP, unless you also use the special "unsafe_put_user()" > code (or the nasty old put_user_ex() code that some of the x86 signal > code uses). > > So one of the advantages of just copy_to_user() ends up being visible > only on Broadwell+ (or whatever the SMAP cutoff is). Good point. The cpu I was testing on was an AMD A10. I don't actually have a cpu that supports SMAP handy. If you would like I can post the minimal patches and benckmark so anyone who is interested could reproduce this for themselves. I suspect that if it is down to only 20ns without SMAP this will definitely be a performance improvement in the presence of SMAP. Eric