From: Linus Torvalds > Sent: 04 March 2023 20:48 > > On Sat, Mar 4, 2023 at 12:31 PM Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > > > Good news: gcc provides a lot of control as to how it inlines string > > ops, most notably: > > -mstringop-strategy=alg > > Note that any static decision is always going to be crap somewhere. > You can make it do the "optimal" thing for any particular machine, but > I consider that to be just garbage. > > What I would actually like to see is the compiler always generate an > out-of-line call for the "big enough to not just do inline trivially" > case, but do so with the "rep stosb/movsb" calling convention. I think you also want it to differentiate between requests that are known to be a whole number of words and ones that might be byte sized. For the kmalloc+memzero case you know you can zero a whole number of words - so all the checks memset has to do for byte length/alignment can be removed. The same is true for memcpy() calls used for structure copies. The compiler knows that aligned full-word copies can be done. So it shouldn't be calling a function that has to redo the tests. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)