On Mon, Apr 18, 2022 at 3:15 AM Borislav Petkov <bp@xxxxxxxxx> wrote: > > Yah, wanted to singlestep that whole asm anyway to make sure it is good. > And just started going through it - I think it can be even optimized a > bit to use %rax for the rest bytes and decrement it into 0 eventually. Ugh. If you do this, you need to have a big comment about how that %rcx value gets fixed up with EX_TYPE_UCOPY_LEN (which basically ends up doing "%rcx = %rcx*8+%rax" in ex_handler_ucopy_len() for the exception case). Plus you need to explain the xorl here: > 3: > xorl %eax,%eax > RET because with your re-written function it *looks* like %eax is already zero, so - once again - this is about how the exception cases work and get here magically. So that needs some big comment about "if an exception happens, we jump to label '3', and the exception handler will fix up %rcx, but we'll need to clear %rax". Or something like that. But yes, that does look like it will work, it's just really subtle how %rcx is zero for the 'rest bytes', and %rax is the byte count remaining in addition. Linus