On Tue, Apr 26, 2022 at 06:29:16PM -0700, Linus Torvalds wrote: > Yeah, yeah, you could also use that _ASM_EXTABLE_TYPE_REG thing for > the second exception point, and keep %rcx as zero, and keep it in > %eax, and depend on that whole "%rcx = 8*%rcx+%rax" fixing it up, and > knowing that if an exception does *not* happen, %rcx will be zero from > the word-size loop. Exactly! > But that really seems much too subtle for me - why not just keep > things in %rcx, and try to make this look as much as possible like the > "rep stosq + rep stosb" case? > > And finally: I still think that those fancy exception table things are > *much* too fancy, and much too subtle, and much too complicated. This whole confusion goes to show that this really is too subtle. ;-\ I mean, it is probably fine for some simple asm where you want to have the register fixup out of the way. But for something like that where I wanted the asm to be optimal but not tricky at the same time, that tricky "hidden fixup" might turn out to be not such a good idea. And sure, yeah, we can make this work this way now and all but after time passes, swapping all that tricky stuff back in is going to be a pain. That's why I'm leaving breadcrumbs everywhere I can. > So I'd actually prefer to get rid of them entirely, and make the code > use the "no register changes" exception, and make the exception > handler do a proper site-specific fixup. At that point, you can get > rid of all the "mask bits early" logic, get rid of all the extraneous > 'test' instructions, and make it all look something like below. Yap, and the named labels make it even clearer. > NOTE! I've intentionally kept the %eax thing using 32-bit instructions > - smaller encoding, and only the low three bits matter, so why > move/mask full 64 bits? Some notes below. > NOTE2! Entirely untested. I'll poke at it. > But I tried to make the normal code do minimal work, and then fix > things up in the exception case more. Yap. > So it just keeps the original count in the 32 bits in %eax until it > wants to test it, and then uses the 'andl' to both mask and test. And > the exception case knows that, so it masks there too. Sure. > I dunno. But I really think that whole new _ASM_EXTABLE_TYPE_REG and > EX_TYPE_UCOPY_LEN8 was unnecessary. I've pointed this out to Mr. Z. He likes them as complex as possible. :-P > SYM_FUNC_START(clear_user_original) > movl %ecx,%eax I'll add a comment here along the lines of us only needing the 32-bit quantity of the size as we're dealing with the rest bytes and so we save a REX byte in the opcode. > shrq $3,%rcx # qwords Any particular reason for the explicit "q" suffix? The register already denotes the opsize and the generated opcode is the same. > jz .Lrest_bytes > > # do the qwords first > .p2align 4 > .Lqwords: > movq $0,(%rdi) > lea 8(%rdi),%rdi > dec %rcx Like here, for example. I guess you wanna be explicit as to the operation width... > jnz .Lqwords > > .Lrest_bytes: > andl $7,%eax # rest bytes > jz .Lexit > > # now do the rest bytes > .Lbytes: > movb $0,(%rdi) > inc %rdi > decl %eax > jnz .Lbytes > > .Lexit: > xorl %eax,%eax > RET > > .Lqwords_exception: > # convert qwords back into bytes to return to caller > shlq $3, %rcx > andl $7, %eax > addq %rax,%rcx > jmp .Lexit > > .Lbytes_exception: > movl %eax,%ecx > jmp .Lexit > > _ASM_EXTABLE_UA(.Lqwords, .Lqwords_exception) > _ASM_EXTABLE_UA(.Lbytes, .Lbytes_exception) Yap, I definitely like this one more. Straightforward. Thx! -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette