On Tue, 29 Aug 2023 at 12:45, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote: > > So I think I know how to fix it, but I'm going to sleep on it. I think you can just skip the %r8 games, and do that leal (%rax,%rcx,8),%rcx in the exception fixup code, since %rax will have the low bits of the byte count, and %rcx will have the remaining qword count. We should also have some test-case for partial reads somewhere, but I have to admit that when I did the cleanup patches I just wrote some silly test myself (ie just doing a 'mmap()' and then reading/writing into the end of that mmap at different offsets. I didn't save that hacky thing, I'm afraid. I also tried to figure out if there is any CPU we should care about that doesn't like 'rep movsq', but I think you are right that there really isn't. The "good enough" rep things were introduced in the PPro if I recall correctly, and while you could disable them in the BIOS, by the time Intel did 64-bit in Northwood (?) it was pretty much standard. So yeah, no reason to have the unrolled loop at all, and I think your patch is fine conceptually, just needs fixing and testing for the partial success case. Oh, and you should also remove the clobbers of r8-r11 in the copy_user_generic() inline asm in <asm/uaccess_64.h> when you've fixed the exception handling. The only reason for those clobbers were for that unrolled register use. So only %rax ends up being a clobber for the rep_movs_alternative case, as far as I can tell. Linus