Re: [RFC][CFT][PATCHSET v1] uaccess unification

Vineet Gupta <Vineet.Gupta1@xxxxxxxxxxxx> · Wed, 29 Mar 2017 14:14:22 -0700

On 03/29/2017 01:29 PM, Al Viro wrote:
> On Wed, Mar 29, 2017 at 01:08:12PM -0700, Vineet Gupta wrote:
> 
>> Hi Al,
>>
>> Thx for taking this up. It seems ARC was missing INLINE_COPY* switch likely due to
>> existing 2 variants (inline/out-of-line) we already have.
>> I've added a patch for that (attached too) - boot tested the series on ARC.
> 
> BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win.

Just to be clear, your series was doing this for everyone.

> It's probably arch-dependent and it would be nice if somebody compared
> performance with and without inlining those...  ARC, in particular, has
> __arc_copy_{to,from}_user() inlining a whole lot, even in case of non-constant
> size and your patch, AFAICS, will inline all of it in *all* cases. 

Yes we do inline all of it: the non-constant case is actually simpler, it is a
simple byte loop.

		"	mov.f   lp_count, %0		\n"
		"	lpnz 3f				\n"
		"	ldb.ab  %1, [%3, 1]		\n"
		"1:	stb.ab  %1, [%2, 1]		\n"
		"	sub     %0, %0, 1		\n"

Doing it out of line (3 args) will be 4 instructions anyways.

For constant size, there's laddered copy for blocks of 16 bytes + stragglers 1-15.
We do "manual" constant propagation there to compile time optimize away the
straggler part. But yes all of this is emitted inline.

> It might
> end up being a win, but that's not apriori obvious...  Do you have any
> profiling results in that area?

Unfortunately not at the moment. The reason for adding out-of-line variant was not
so much as performance but to improve the footprint for -Os case (some customer I
think).