Re: Simplify load_unaligned_zeropad() (was Re: [GIT PULL] Ceph updates for 5.20-rc1)

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Tue, 16 Aug 2022 10:57:45 -0700

On Tue, Aug 16, 2022 at 1:02 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > +#define EX_TYPE_ZEROPAD                      20 /* load ax from dx zero-padded */
>
> This comment is now woefully incorrect.

Yes it is. Will fix.

> > +     if (insn_decode(&insn, (void *) regs->ip, len, INSN_MODE_KERN))
> > +             return false;
>
> We have insn_decode_kernel() for exactly this (very) common case.

I did that originally, and then I undid it in disgust, because that
interface is too simple.

In particular, it just uses MAX_INSN_SIZE blindly. Which I didn't want
to do when I actually had the instruction size.

Yes, yes, I also check the decode size after-the-fact, but I didn't
want the decoder to even look at the invalid bytes.

This exception case is about the data being at the end of the page, I
wanted the fixup to be aware of code being at the end of a page too.

(And yeah, I'm not convinced that the decoder is that careful, but at
that point I feel it's a decoder issue, and not an issue with the code
I write).

> > +     if (insn.length != len)
> > +             return false;
> > +
> > +     if (insn.opcode.bytes[0] != 0x8b)
> > +             return false;
>
> I was wondering if we want something like MOV_INSN_OPCODE for 0x8b to
> enhance readability, otoh it's currently 0x8b all over the place, so
> whatever. At some point you gotta have the insn tables with you anyway.

Oh, I didn't even notice that we had another case of 0x8b checking.
But yeah, the MMIO decoding wants to see what kind of access it is.

But it wouldn't be MOV_INSN_OPCODE, it would have to be something like
MOV_WORD_INSN_MODRM_REG_OPCODE, because that's what people are
checking for - not just that it's a 'mov', but direction and size too.

And then you'd have to also decide whether you describe those
#define's using the Intel ordering or the one we actually use in our
asm. So now the symbolic names are ambiguous anyway, in ways that the
actual byte value isn't.

So yeah, I suspect it ends up just being an issue of "you have to have
the opcode tables in front of you anyway".

Because you also need to check that that's the only encoding for "mov"
(I checked, and yes, it is - there are other 'mov' encodings that move
directly from memory into %rax, but those are using absolute addresses
that don't make sense for a "this is an unaligned that might be a page
crosser")

Side note: now that I look at it, I note that the MMIO decoding
doesn't handle the absolute address case. It's not really relevant for
the kernel, but I could *imagine* that it is relevant in user mode,
and the SEV case actually does have a "decode and emulate user mode
instruction case".

Not a big issue. If some crazy user even maps IO at a fixed address,
and then uses a "mov %eax <-> moffset" instruction, the kernel
emulation will print out an error and refuse to emulate it.

                  Linus