On Tue, Aug 20, 2024, Keith Busch wrote: > To test, I executed the following program against a qemu emulated pci > device resource. Prior to this kernel patch, it would fail with > > traps: vmovdq[378] trap invalid opcode ip:4006b2 sp:7ffe2f5bb680 error:0 in vmovdq[6b2,400000+1000] ... > +static const struct gprefix pfx_avx_0f_6f_0f_7f = { > + N, I(Avx | Aligned, em_mov), N, I(Avx | Unaligned, em_mov), > +}; > + > +static const struct opcode avx_0f_table[256] = { > + /* 0x00 - 0x5f */ > + X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), > + /* 0x60 - 0x6F */ > + X8(N), X4(N), X2(N), N, > + GP(SrcMem | DstReg | ModRM | Mov, &pfx_avx_0f_6f_0f_7f), > + /* 0x70 - 0x7F */ > + X8(N), X4(N), X2(N), N, > + GP(SrcReg | DstMem | ModRM | Mov, &pfx_avx_0f_6f_0f_7f), > + /* 0x80 - 0xFF */ > + X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), > +}; Mostly as an FYI, we're likely going to run into more than just VMOVDQU sooner rather than later. E.g. gcc-13 with -march=x86-64-v3 (which per Vitaly is now the default gcc behavior for some distros[*]) compiles this chunk from KVM selftests' kvm_fixup_exception(): regs->rip = regs->r11; regs->r9 = regs->vector; regs->r10 = regs->error_code; intto this monstronsity (which is clever, but oof). 405313: c4 e1 f9 6e c8 vmovq %rax,%xmm1 405318: 48 89 68 08 mov %rbp,0x8(%rax) 40531c: 48 89 e8 mov %rbp,%rax 40531f: c4 c3 f1 22 c4 01 vpinsrq $0x1,%r12,%xmm1,%xmm0 405325: 49 89 6d 38 mov %rbp,0x38(%r13) 405329: c5 fa 7f 45 00 vmovdqu %xmm0,0x0(%rbp) I wouldn't be surprised if the same packing shenanigans get employed when generating code for a struct overlay of emulated MMIO. [*] https://lore.kernel.org/all/20240920154422.2890096-1-vkuznets@xxxxxxxxxx