On 03/29/2017 01:29 PM, Al Viro wrote: > On Wed, Mar 29, 2017 at 01:08:12PM -0700, Vineet Gupta wrote: > >> Hi Al, >> >> Thx for taking this up. It seems ARC was missing INLINE_COPY* switch likely due to >> existing 2 variants (inline/out-of-line) we already have. >> I've added a patch for that (attached too) - boot tested the series on ARC. > > BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win. Just to be clear, your series was doing this for everyone. > It's probably arch-dependent and it would be nice if somebody compared > performance with and without inlining those... ARC, in particular, has > __arc_copy_{to,from}_user() inlining a whole lot, even in case of non-constant > size and your patch, AFAICS, will inline all of it in *all* cases. Yes we do inline all of it: the non-constant case is actually simpler, it is a simple byte loop. " mov.f lp_count, %0 \n" " lpnz 3f \n" " ldb.ab %1, [%3, 1] \n" "1: stb.ab %1, [%2, 1] \n" " sub %0, %0, 1 \n" Doing it out of line (3 args) will be 4 instructions anyways. For constant size, there's laddered copy for blocks of 16 bytes + stragglers 1-15. We do "manual" constant propagation there to compile time optimize away the straggler part. But yes all of this is emitted inline. > It might > end up being a win, but that's not apriori obvious... Do you have any > profiling results in that area? Unfortunately not at the moment. The reason for adding out-of-line variant was not so much as performance but to improve the footprint for -Os case (some customer I think).