On 03/29/2017 04:42 PM, Al Viro wrote: > On Wed, Mar 29, 2017 at 02:14:22PM -0700, Vineet Gupta wrote: > >>> BTW, I wonder if inlining all of the copy_{to,from}_user() is actually a win. >> >> Just to be clear, your series was doing this for everyone. > > Huh? It's just that most of architectures *were* inlining that; > arc change was unintentional (copy_from_user/copy_to_user went > uninlined, which your patch deals with), but it's not that I'm forcing > inlining on every architecture out there. That is correct - I didn't mean to say you changed it per-se , but that I saw INLINE_COPY* all over the place but not for ARC :-) >>> It might >>> end up being a win, but that's not apriori obvious... Do you have any >>> profiling results in that area? >> >> Unfortunately not at the moment. The reason for adding out-of-line variant was not >> so much as performance but to improve the footprint for -Os case (some customer I >> think). > > Just to make it clear - I'm less certain than Linus that uninlined is uniformly > better, but I have a strong suspicion that on most architectures it *is*. > And not just in terms of kernel size - I would expect better speed as well. > The only reason why these knobs are there is that I want to separate the > "who should switch to uninlined" from this series and allow for the possibility > that for some architectures inlined will really turn out to be better. > I do _not_ expect that there'll be many of those; if it turns out that there's > none, I'll be only glad to make the guts of copy_{to,from}_user() always > out of line. > > IOW your patch reverts an unintentional change of behaviour, but I really > wonder if that (out-of-line guts of copy_{to,from}_user) isn't an overall > win for arc. I've applied your patch, but it would be nice if you could > arrange for testing with and without inlining and post the results. The > same goes for all architectures; again, I would expect out-of-line to end up > a win on most of them. I guess I can in next day or two - but mind you the inline version for ARC is kind of special vs. other arches. We have this "manual" constant propagation to elide the unrolled LD/ST for 1-15 byte stragglers, when @sz is constant. In the out-of-line version, we loose all of that and the code needs to fall thru all the cases. We can possibly improve that by re-arranging the checks - so exit early if no stragglers etc ...