On Mon, 2017-05-29 at 23:48 +0200, Borislav Petkov wrote: > On Fri, May 05, 2017 at 11:17:07AM -0700, Ricardo Neri wrote: > > String instructions are special because in protected mode, the linear > > address is always obtained via the ES segment register in operands that > > use the (E)DI register. > > ... and DS for rSI. Right, I omitted this in the commit message. > > If we're going to account for both operands of string instructions with > two operands. > > Btw, LODS and OUTS use only DS:rSI as a source operand. So we have to be > careful with the generalization here. So if ES:rDI is the only seg. reg > we want, then we don't need to look at those insns... (we assume DS by > default). My intention with this function is to write a function that does only one thing: identify string instructions, irrespective of the operands they use. A separate function, resolve_seg_register, will have the logic to decide what to segment register to use based on the registers used as operands, whether we are looking at a string instruction, whether we have segment override prefixes and whether such overrides should be ignored. If I was to leave out string instructions from this function it should be renamed as is_string_instruction_non_lods_outs. In my opinion this separation makes the code more clear and I would end up having logic to decide which segment register to use in two places. Does it makes sense to you? > > ... > > > +/** > > + * is_string_instruction - Determine if instruction is a string instruction > > + * @insn: Instruction structure containing the opcode > > + * > > + * Return: true if the instruction, determined by the opcode, is any of the > > + * string instructions as defined in the Intel Software Development manual. > > + * False otherwise. > > + */ > > +static bool is_string_instruction(struct insn *insn) > > +{ > > + insn_get_opcode(insn); > > + > > + /* all string instructions have a 1-byte opcode */ > > + if (insn->opcode.nbytes != 1) > > + return false; > > + > > + switch (insn->opcode.bytes[0]) { > > + case INSB: > > + /* fall through */ > > + case INSW_INSD: > > + /* fall through */ > > + case OUTSB: > > + /* fall through */ > > + case OUTSW_OUTSD: > > + /* fall through */ > > + case MOVSB: > > + /* fall through */ > > + case MOVSW_MOVSD: > > + /* fall through */ > > + case CMPSB: > > + /* fall through */ > > + case CMPSW_CMPSD: > > + /* fall through */ > > + case STOSB: > > + /* fall through */ > > + case STOSW_STOSD: > > + /* fall through */ > > + case LODSB: > > + /* fall through */ > > + case LODSW_LODSD: > > + /* fall through */ > > + case SCASB: > > + /* fall through */ > > That "fall through" for every opcode is just too much. Also, you can use > the regularity of the x86 opcode space and do: > > case 0x6c ... 0x6f: /* INS/OUTS */ > case 0xa4 ... 0xa7: /* MOVS/CMPS */ > case 0xaa ... 0xaf: /* STOS/LODS/SCAS */ > return true; > default: > return false; > } > > And voila, there's your compact is_string_insn() function! :^) Thanks for the suggestion! It looks really nice. I will implement accordingly. Thanks and BR, Ricardo -- To unsubscribe from this list: send the line "unsubscribe linux-msdos" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html