On Mon, Aug 06, 2012 at 12:28:05PM +0300, Avi Kivity wrote: > On 08/06/2012 11:58 AM, Gleb Natapov wrote: > > On Mon, Aug 06, 2012 at 11:50:20AM +0300, Avi Kivity wrote: > >> On 07/30/2012 05:38 PM, Gleb Natapov wrote: > >> > Optimize "rep ins" by allowing emulator to write back more than one > >> > datum at a time. Introduce new operand type OP_MEM_STR which tells > >> > writeback() that dst contains pointer to an array that should be written > >> > back as opposite to just one data element. > >> > > >> > } > >> > > >> > - memcpy(dest, rc->data + rc->pos, size); > >> > - rc->pos += size; > >> > + if (ctxt->rep_prefix && !(ctxt->eflags & EFLG_DF)) { > >> > + ctxt->dst.data = rc->data + rc->pos; > >> > + ctxt->dst.type = OP_MEM_STR; > >> > + ctxt->dst.count = (rc->end - rc->pos) / size; > >> > + rc->pos = rc->end; > >> > >> Should take into account the segment limit. > >> > > It does. During write back. pio_in_emulated() should linearize() address > > before calculating page boundary, but this is (minor) bug unrelated to the patch > > series. > > I see, yes, this problem preexists. > > However, in normal conditions, non-repeating instructions will not reach > the emulator at all since they will fault in the guest (or in the shadow > mmu, which will reflect the fault to the guest). Here, the first > iteration may fit in the segment but the second will not, so this will fail. > Correct. And this can happen with or without the patch series. > It's not a huge problem since no guest does this. > > >> > @@ -2732,7 +2747,7 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt, > >> > static void string_addr_inc(struct x86_emulate_ctxt *ctxt, int reg, > >> > struct operand *op) > >> > { > >> > - int df = (ctxt->eflags & EFLG_DF) ? -1 : 1; > >> > + int df = (ctxt->eflags & EFLG_DF) ? -op->count : op->count; > >> > > >> > register_address_increment(ctxt, &ctxt->regs[reg], df * op->bytes); > >> > op->addr.mem.ea = register_address(ctxt, ctxt->regs[reg]); > >> > @@ -3672,7 +3687,7 @@ static struct opcode opcode_table[256] = { > >> > I(DstReg | SrcMem | ModRM | Src2Imm, em_imul_3op), > >> > I(SrcImmByte | Mov | Stack, em_push), > >> > I(DstReg | SrcMem | ModRM | Src2ImmByte, em_imul_3op), > >> > - I2bvIP(DstDI | SrcDX | Mov | String, em_in, ins, check_perm_in), /* insb, insw/insd */ > >> > + I2bvIP(DstDI | SrcDX | Mov | String | Unaligned, em_in, ins, check_perm_in), /* insb, insw/insd */ > >> > >> Eww. > > This brings us back to the question what alignment check is doing in > > linearize :) > > It's checking alignment... > It either check it in a wrong place or we need to mark all instructions that do not care about alignment, so the patch is not "Eww" :) > Let's see how we would fix this mess. We need to move linearization > (and virt->phys translation) to the decode stage, or perhaps the > execution state, but before instruction dispatch. This would cause all > the various exceptions to be checked against before execution, and would > avoid double translation for RMW operands. > Execution state likely. String instruction works on segmented address for instance (address increment/decrement). May be there are others. > > >> > string_addr_inc(ctxt, VCPU_REGS_RDI, &ctxt->dst); > >> > > >> > if (ctxt->rep_prefix && (ctxt->d & String)) { > >> > + unsigned int count; > >> > struct read_cache *r = &ctxt->io_read; > >> > - register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX], -1); > >> > + if ((ctxt->d & SrcMask) == SrcSI) > >> > + count = ctxt->src.count; > >> > + else > >> > + count = ctxt->dst.count; > >> > >> Does this work correctly for 'rep movs' and friends? > >> > > (src|dst).count is initialized to 1 during decode, so anything that does > > not touch "count" behaves exactly like before. > > Ok. > > > -- > error compiling committee.c: too many arguments to function -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html