Re: [patch 02/14] tmpfs: fix regressions from wider use of ZERO_PAGE

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 26 Apr 2022 18:29:16 -0700

On Tue, Apr 26, 2022 at 5:14 PM Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> So when we enter the function, we shift %rcx to get the number of
> qword-sized quantities to zero:
>
> SYM_FUNC_START(clear_user_original)
>         mov %rcx,%rax
>         shr $3,%rcx             # qwords        <---

Yes.

But that's what we do for "rep stosq" too, for all the same reasons.

> but when we encounter the fault here, we return *%rcx* - not %rcx << 3
> - latter being the *bytes* leftover which we *actually* need to return
> when we encounter the #PF.

Yes, but:

> So, we need to shift back when we fault during the qword-sized zeroing,
> i.e., full function below, see label 3 there.

No.

The problem is that you're using the wrong exception type.

Thanks for posting the whole thing, because that makes it much more obvious.

You have the exception table entries switched.

You should have

         _ASM_EXTABLE_TYPE_REG(0b, 3b, EX_TYPE_UCOPY_LEN8, %rax)
        _ASM_EXTABLE_UA(2b, 3b)

and not need that label '4' at all.

Note how that "_ASM_EXTABLE_TYPE_REG" thing is literally designed to do

   %rcx = 8*%rcx+%rax

in the exception handler.

Of course, to get that regular

        _ASM_EXTABLE_UA(2b, 3b)

to work, you need to have the final byte count in %rcx, not in %rax so
that means that the "now do the rest of the bytes" case should have
done something like

        movl %eax,%ecx
2:      movb $0,(%rdi)
        inc %rdi
        decl %ecx
        jnz  2b

instead.

Yeah, yeah, you could also use that _ASM_EXTABLE_TYPE_REG thing for
the second exception point, and keep %rcx as zero, and keep it in
%eax, and depend on that whole "%rcx = 8*%rcx+%rax" fixing it up, and
knowing that if an exception does *not* happen, %rcx will be zero from
the word-size loop.

But that really seems much too subtle for me - why not just keep
things in %rcx, and try to make this look as much as possible like the
"rep stosq + rep stosb" case?

And finally: I still think that those fancy exception table things are
*much* too fancy, and much too subtle, and much too complicated.

So I'd actually prefer to get rid of them entirely, and make the code
use the "no register changes" exception, and make the exception
handler do a proper site-specific fixup. At that point, you can get
rid of all the "mask bits early" logic, get rid of all the extraneous
'test' instructions, and make it all look something like below.

NOTE! I've intentionally kept the %eax thing using 32-bit instructions
- smaller encoding, and only the low three bits matter, so why
move/mask full 64 bits?

NOTE2! Entirely untested. But I tried to make the normal code do
minimal work, and then fix things up in the exception case more. So it
just keeps the original count in the 32 bits in %eax until it wants to
test it, and then uses the 'andl' to both mask and test. And the
exception case knows that, so it masks there too.

I dunno. But I really think that whole new _ASM_EXTABLE_TYPE_REG and
EX_TYPE_UCOPY_LEN8 was unnecessary.

               Linus

SYM_FUNC_START(clear_user_original)
        movl %ecx,%eax
        shrq $3,%rcx             # qwords
        jz .Lrest_bytes

        # do the qwords first
        .p2align 4
.Lqwords:
        movq $0,(%rdi)
        lea 8(%rdi),%rdi
        dec %rcx
        jnz .Lqwords

.Lrest_bytes:
        andl $7,%eax             # rest bytes
        jz .Lexit

        # now do the rest bytes
.Lbytes:
        movb $0,(%rdi)
        inc %rdi
        decl %eax
        jnz .Lbytes

.Lexit:
        xorl %eax,%eax
        RET

.Lqwords_exception:
        # convert qwords back into bytes to return to caller
        shlq $3, %rcx
        andl $7, %eax
        addq %rax,%rcx
        jmp .Lexit

.Lbytes_exception:
        movl %eax,%ecx
        jmp .Lexit

        _ASM_EXTABLE_UA(.Lqwords, .Lqwords_exception)
        _ASM_EXTABLE_UA(.Lbytes, .Lbytes_exception)