On Sun, Sep 22, 2024 at 05:57:05AM -0700, Sean Christopherson wrote: > On Tue, Aug 20, 2024, Keith Busch wrote: > > To test, I executed the following program against a qemu emulated pci > > device resource. Prior to this kernel patch, it would fail with > > > > traps: vmovdq[378] trap invalid opcode ip:4006b2 sp:7ffe2f5bb680 error:0 in vmovdq[6b2,400000+1000] > > ... > > > +static const struct gprefix pfx_avx_0f_6f_0f_7f = { > > + N, I(Avx | Aligned, em_mov), N, I(Avx | Unaligned, em_mov), > > +}; > > + > > +static const struct opcode avx_0f_table[256] = { > > + /* 0x00 - 0x5f */ > > + X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), > > + /* 0x60 - 0x6F */ > > + X8(N), X4(N), X2(N), N, > > + GP(SrcMem | DstReg | ModRM | Mov, &pfx_avx_0f_6f_0f_7f), > > + /* 0x70 - 0x7F */ > > + X8(N), X4(N), X2(N), N, > > + GP(SrcReg | DstMem | ModRM | Mov, &pfx_avx_0f_6f_0f_7f), > > + /* 0x80 - 0xFF */ > > + X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), > > +}; > > Mostly as an FYI, we're likely going to run into more than just VMOVDQU sooner > rather than later. E.g. gcc-13 with -march=x86-64-v3 (which per Vitaly is now > the default gcc behavior for some distros[*]) compiles this chunk from KVM > selftests' kvm_fixup_exception(): > > regs->rip = regs->r11; > regs->r9 = regs->vector; > regs->r10 = regs->error_code; > > intto this monstronsity (which is clever, but oof). > > 405313: c4 e1 f9 6e c8 vmovq %rax,%xmm1 > 405318: 48 89 68 08 mov %rbp,0x8(%rax) > 40531c: 48 89 e8 mov %rbp,%rax > 40531f: c4 c3 f1 22 c4 01 vpinsrq $0x1,%r12,%xmm1,%xmm0 > 405325: 49 89 6d 38 mov %rbp,0x38(%r13) > 405329: c5 fa 7f 45 00 vmovdqu %xmm0,0x0(%rbp) > > I wouldn't be surprised if the same packing shenanigans get employed when generating > code for a struct overlay of emulated MMIO. Thanks for the notice. I'm hoping we can proceed with just the mov instructions for now, unless someone already has a real use for these on emulated MMIO. Otherwise, we can cross that bridge when we get there. As it is, if just the vmovdq[u,a] are okay, I have a follow on for vmovdqu64, though I'm currently having trouble adding AVX-512 registers. Simply increasing the size of the struct x86_emulate_ctxt appears to break something even without trying to emulate those instructions. But I want to wait to see if this first part is okay before spending too much time on it.