On Fri, 1 Sept 2023 at 08:20, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > cp_new_stat and the counterpart for statx can dodge this rep mov by > filling user memory directly. Yeah, they could be made to use the "unsafe_put_user()" machinery these days, and we could go back to the good old days of avoiding the extra temp buffer. > I'm going to patch this, but first I want to address the bigger > problem of glibc implementing fstat as newfstatat, demolishing perf of > that op. In their defense currently they have no choice as this is the > only exporter of the "new" struct stat. I'll be sending a long email > to fsdevel soon(tm) with a proposed fix. I wouldn't mind re-instating the "copy directly to user space rather than go through a temporary buffer", for the stat family of functions, so please do.. > So I was wondering if rep movsq is any worse than ERMS'ed rep movsb > when there is no tail to handle and the buffer is aligned to a page, > or more to the point if clear_page gets any benefit for going with > movsb. Hard to tell. 'movsq' is *historically* better, and likely on all current microarchitectures. But 'movsb' is actually in many ways easier for the CPU to optimize, because there's no question of the sub-chunking if anything is not aligned just rught. Linus