On Wed, 2023-07-19 at 08:59 -0700, Fangrui Song wrote: > On Wed, Jul 19, 2023 at 5:53 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > > > On Tue, 2023-07-18 at 18:17 -0700, Yonghong Song wrote: > > [...] > > > > > > +static void emit_movsx_reg(u8 **pprog, int num_bits, bool is64, u32 dst_reg, > > > > > > + u32 src_reg) > > > > > > +{ > > > > > > + u8 *prog = *pprog; > > > > > > + > > > > > > + if (is64) { > > > > > > + /* movs[b,w,l]q dst, src */ > > > > > > + if (num_bits == 8) > > > > > > + EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbe, > > > > > > + add_2reg(0xC0, src_reg, dst_reg)); > > > > > > + else if (num_bits == 16) > > > > > > + EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbf, > > > > > > + add_2reg(0xC0, src_reg, dst_reg)); > > > > > > + else if (num_bits == 32) > > > > > > + EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x63, > > > > > > + add_2reg(0xC0, src_reg, dst_reg)); > > > > > > + } else { > > > > > > + /* movs[b,w]l dst, src */ > > > > > > + if (num_bits == 8) { > > > > > > + EMIT4(add_2mod(0x40, src_reg, dst_reg), 0x0f, 0xbe, > > > > > > + add_2reg(0xC0, src_reg, dst_reg)); > > > > > > > > Nit: As far as I understand 4-126 Vol. 2B of [1] > > > > the 0x40 prefix (REX prefix) is optional here > > > > (same as implemented below for num_bits == 16). > > > > > > I think 0x40 prefix at least neededif register is from R8 - R15? > > > > Yes, please see below. > > > > > I use this website to do asm/disasm experiments and did > > > try various combinations with first 8 and later 8 registers > > > and it seems correct results are generated. > > > > It seems all roads lead to that web-site, I used it as well :) > > Today I learned that the following could be used: > > > > echo 'movsx rax,ax' | as -o /dev/null -aln -msyntax=intel -mnaked-reg > > > > Which opens a road to scripting experiments. > > This internal tool from llvm-project may also be useful:) > > llvm-mc -triple=x86_64 -show-inst -x86-asm-syntax=intel > -output-asm-variant=1 <<< 'movsx rax, ax' Thank you, this works (with -show-encoding). > > > > > > > > > [1] https://cdrdv2.intel.com/v1/dl/getContent/671200 > > > > > > > > > > > > > > + } else if (num_bits == 16) { > > > > > > + if (is_ereg(dst_reg) || is_ereg(src_reg)) > > > > > > + EMIT1(add_2mod(0x40, src_reg, dst_reg)); > > > > > > + EMIT3(add_2mod(0x0f, src_reg, dst_reg), 0xbf, > > > > > > > > Nit: Basing on the same manual I don't understand why > > > > add_2mod(0x0f, src_reg, dst_reg) is used, '0xf' should suffice > > > > (but I tried it both ways and it works...). > > > > > > From the above online assembler website. > > > > > > But I will check the doc to see whether it can be simplified. > > > > I tried all combinations of r0..r9 for 64/32-bit destinations, > > 32/16/8 sources [1]: > > - 0x40 based prefix is generated if any of the following is true: > > - dst is 64 bit > > - dst is ereg > > - src is ereg > > - dst is 32-bit and src is 'sil' (part of 'rsi', used for r2) > > (!) This one is surprising and web-site shows the same results. > > For example `movsx eax,sil` is encoded as `40 0F BE C6`, > > disassembling `0F BE C6` (w/o prefix) gives `movsx eax,dh`. I think I found the place in the manual that explains situation: 3.7.2.1 Register Operands in 64-Bit Mode Register operands in 64-bit mode can be any of the following: - ... - 8-bit general-purpose registers: AL, BL, CL, DL, SIL, DIL, SPL, BPL, and R8B-R15B are available using REX prefixes; AL, BL, CL, DL, AH, BH, CH, DH are available without using REX prefixes. - ... Vol. 1, page 3-21 https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf > > - opcodes: > > - 63 64-bit dst, 32-bit src > > - 0F BF 64-bit dst, 16-bit src > > - 0F BE 64-bit dst, 8-bit src > > - 0F BF 32-bit dst, 16-bit src (same as 64-bit dst) > > - 0F BE 32-bit dst, 8-bit src (same as 64-bit dst) > > > > Script is at [2] (it is not particularly interesting, but in case if > > you want to tweak it). > > > > [1] https://gist.github.com/eddyz87/94b35fd89f023c43dd2480e196b28ea1 > > [2] https://gist.github.com/eddyz87/60991379c547df11d30fa91901862227 > > > > > > > > + add_2reg(0xC0, src_reg, dst_reg)); > > > > > > + } > > > > > > + } > > > > > > + > > > > > > + *pprog = prog; > > > > > > +} > > [...] > > >