Re: [PATCH bpf-next v2 03/15] bpf: Support new sign-extension mov insns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2023-07-18 at 18:17 -0700, Yonghong Song wrote:
[...]
> > > > +static void emit_movsx_reg(u8 **pprog, int num_bits, bool is64, u32 dst_reg,
> > > > +			   u32 src_reg)
> > > > +{
> > > > +	u8 *prog = *pprog;
> > > > +
> > > > +	if (is64) {
> > > > +		/* movs[b,w,l]q dst, src */
> > > > +		if (num_bits == 8)
> > > > +			EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbe,
> > > > +			      add_2reg(0xC0, src_reg, dst_reg));
> > > > +		else if (num_bits == 16)
> > > > +			EMIT4(add_2mod(0x48, src_reg, dst_reg), 0x0f, 0xbf,
> > > > +			      add_2reg(0xC0, src_reg, dst_reg));
> > > > +		else if (num_bits == 32)
> > > > +			EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x63,
> > > > +			      add_2reg(0xC0, src_reg, dst_reg));
> > > > +	} else {
> > > > +		/* movs[b,w]l dst, src */
> > > > +		if (num_bits == 8) {
> > > > +			EMIT4(add_2mod(0x40, src_reg, dst_reg), 0x0f, 0xbe,
> > > > +			      add_2reg(0xC0, src_reg, dst_reg));
> > 
> > Nit: As far as I understand 4-126 Vol. 2B of [1]
> >       the 0x40 prefix (REX prefix) is optional here
> >       (same as implemented below for num_bits == 16).
> 
> I think 0x40 prefix at least neededif register is from R8 - R15?

Yes, please see below.

> I use this website to do asm/disasm experiments and did
> try various combinations with first 8 and later 8 registers
> and it seems correct results are generated.

It seems all roads lead to that web-site, I used it as well :)
Today I learned that the following could be used:

  echo 'movsx rax,ax' | as -o /dev/null -aln -msyntax=intel -mnaked-reg
  
Which opens a road to scripting experiments.

> > 
> > [1] https://cdrdv2.intel.com/v1/dl/getContent/671200
> > 
> > 
> > > > +		} else if (num_bits == 16) {
> > > > +			if (is_ereg(dst_reg) || is_ereg(src_reg))
> > > > +				EMIT1(add_2mod(0x40, src_reg, dst_reg));
> > > > +			EMIT3(add_2mod(0x0f, src_reg, dst_reg), 0xbf,
> > 
> > Nit: Basing on the same manual I don't understand why
> >       add_2mod(0x0f, src_reg, dst_reg) is used, '0xf' should suffice
> >       (but I tried it both ways and it works...).
> 
>  From the above online assembler website.
> 
> But I will check the doc to see whether it can be simplified.

I tried all combinations of r0..r9 for 64/32-bit destinations,
32/16/8 sources [1]:
- 0x40 based prefix is generated if any of the following is true:
  - dst is 64 bit
  - dst is ereg
  - src is ereg
  - dst is 32-bit and src is 'sil' (part of 'rsi', used for r2)
    (!) This one is surprising and web-site shows the same results.
        For example `movsx eax,sil` is encoded as `40 0F BE C6`,
        disassembling `0F BE C6` (w/o prefix) gives `movsx eax,dh`.
- opcodes:
  - 63      64-bit dst, 32-bit src
  - 0F BF   64-bit dst, 16-bit src
  - 0F BE   64-bit dst,  8-bit src
  - 0F BF   32-bit dst, 16-bit src (same as 64-bit dst)
  - 0F BE   32-bit dst,  8-bit src (same as 64-bit dst)
  
Script is at [2] (it is not particularly interesting, but in case if
you want to tweak it).

[1] https://gist.github.com/eddyz87/94b35fd89f023c43dd2480e196b28ea1
[2] https://gist.github.com/eddyz87/60991379c547df11d30fa91901862227

> > > > +			      add_2reg(0xC0, src_reg, dst_reg));
> > > > +		}
> > > > +	}
> > > > +
> > > > +	*pprog = prog;
> > > > +}
[...]





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux