On Wed, 30 Aug 2023 at 07:03, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > Hand-rolled mov loops executing in this case are quite pessimal compared > to rep movsq for bigger sizes. While the upper limit depends on uarch, > everyone is well south of 1KB AFAICS and sizes bigger than that are > common. > > While technically ancient CPUs may be suffering from rep usage, gcc has > been emitting it for years all over kernel code, so I don't think this > is a legitimate concern. > > Sample result from read1_processes from will-it-scale (4KB reads/s): > before: 1507021 > after: 1721828 (+14%) Ok, patch looks fine to me now. So I applied this directly to my tree, since I was the one doing the x86 memcpy cleanups that removed the REP_GOOD hackery anyway. Linus