Re: [PATCH -tip 3/6 V4.1] x86: instruction decorder API

Masami Hiramatsu <mhiramat@xxxxxxxxxx> · Fri, 03 Apr 2009 20:37:37 -0400

Hi Peter,

H. Peter Anvin wrote:
> Masami Hiramatsu wrote:
>> Add x86 instruction decoder to arch-specific libraries. This decoder
>> can decode all x86 instructions into prefix, opcode, modrm, sib,
>> displacement and immediates. This can also show the length of
>> instructions.
>>
>> changes from v4:
>>  - make bitmap tables static.
> 
> Hi Masami,
> 
> On the surface the overall structure looks fine, but I have a couple of 
> concerns:
> 
> 1. is this meant to be able to decode userspace code or just kernel 
> code?  If it is supposed to be able to decode userspace code, is there a 
> reason you're not dealing with 16-bit or V86 mode code at all?  If not, 
> why are you including the 32-bit decoder in a 64-bit kernel (as well as 
> instructions which we're pretty much guaranteed to never use in the 
> kernel, such as ENTER.)

Actually, this aims to decode both of user space and kernel code.
At this point, it just needs to cover kernel code, because kprobes
just want to decode kernel binary.
However, this is just a starting point, uprobe developers want to
use it to decode user-space code. In that case, it needs to be
enhanced.

> 2. you're already not dealing with all existing three-byte opcode 
> spaces, nor with DREX or VEX encodings for upcoming processors.  This 
> doesn't matter so much for the kernel, but it does matter if this is 
> supposed to be used for user-space code.
> 
> 3. is there any need to deal with instruction set differences among 
> processors?  (Again, this depends on the usage model.)

Agreed. When it support decoding user-space code, it should
support all kind of instructions.

> 
> 4. you have a bunch of magic opcode constants all over the place.  This 
> means that as new instructions come in -- and they're going to be coming 
> in -- this is going to be hard to update.  It would be cleaner if we 
> could have an intermediate format that preprocesses down to all the 
> relevant tables and perhaps even some of the code rather than 
> open-coding everything in C.
> 
> This matters... for example you have:
> 
> +		} else if (opcode == 0xea /* jmp far seg:offs */) {
> +			__get_immptr(insn);
> 
> ... but nothing similar for opcode 0x9a.  This is extremely hard to spot 
> with this kind of structure.

Oops, that should be a bug. Hmm, I think we'd better bit-flags tables
for classifying opcodes.
Jim, can your INAT idea help this situation?

http://sources.redhat.com/ml/systemtap/2009-q2/msg00109.html

> 
> The more data-driven we can make it (without bloating the code too much) 
> the better off we are, I believe.
> 
> 	-hpa

Thank you for good advice!

-- 
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@xxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html