On Wed, Apr 07, 2010 at 03:25:10PM +0900, Yoshiaki Tamura wrote: > 2010/4/6 Gleb Natapov <gleb@xxxxxxxxxx>: > > On Tue, Apr 06, 2010 at 01:11:23PM +0900, Yoshiaki Tamura wrote: > >> Hi. > >> > >> When handle_io() is called, rip is currently proceeded *before* actually having > >> I/O handled by qemu in userland. Upon implementing Kemari for > >> KVM(http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg25141.html) mainly in > >> userland qemu, we encountered a problem that synchronizing the content of VCPU > >> before handling I/O in qemu is too late because rip is already proceeded in KVM, > >> Although we avoided this issue with temporal hack, I would like to ask a few > >> question on skip_emulated_instructions. > >> > >> 1. Does rip need to be proceeded before having I/O handled by qemu? > > In current kvm.git rip is proceeded before I/O is handled by qemu only > > in case of "out" instruction. From architecture point of view I think > > it's OK since on real HW you can't guaranty that I/O will take effect > > before instruction pointer is advanced. It is done like that because we > > want "out" emulation to be real fast so we skip x86 emulator. > > Thanks for your reply. > > If proceeding rip later doesn't break the behavior of devices or > introduce slow down, I would like that to be done. > Device can not care less about what value rip register currently has. Why is it matters for you code? > >> 2. If no, is it possible to divide skip_emulated_instructions(), like > >> rec_emulated_instructions() to remember to next_rip, and > >> skip_emulated_instructions() to actually proceed the rip. > > Currently only emulator can call userspace to do I/O, so after > > userspace returns after I/O exit, control is handled back to emulator > > unconditionally. "out" instruction skips emulator, but there is nothing > > to do after userspace returns, so regular cpu loop is executed. If we > > want to advance rip only after userspace executed I/O done by "out" we > > need to distinguish who requested I/O (emulator or kvm_fast_pio_out()) > > and call different code depending on who that was. It can be done by > > having a callback that (if not null) is called on return from userspace. > > Your suggestion is to introduce a callback entry, and instead of > calling kvm_rip_write(), set it to the entry before calling > kvm_fast_pio_out(), > and check the entry upon return from the userspace, correct? > Something like that, yes. > According to the comment in x86.c, when it was "out" instruction > vcpu->arch.pio.count is set to 0 to skip the emulator. > To call kvm_fast_pio_out(), "!string" and "!in" must be set. > If we can check, vcpu->arch.pio.count, "string" and "in" on return > from the userspace, can't we distinguish who requested I/O, emulator > or kvm_fast_pio_out()? > May be, but callback approach is much cleaner. "string" and "in" can have stale data for instance. > >> 3. svm has next_rip but when it is 0, nop is emulated. Can this be modified to > >> continue without emulating nop when next_rip is 0? > >> > > I don't see where nop is emulated if next_rip is 0. As far as I see in > > case of next_rip==0 an instruction at rip is decoded to figure out its > > length and then rip is advanced by instruction length. Anyway next_rip > > is svm thing only. > > Sorry. I wasn't understanding the code enough. > > static void skip_emulated_instruction(struct kvm_vcpu *vcpu) > { > ... > if (!svm->next_rip) { > if (emulate_instruction(vcpu, 0, 0, EMULTYPE_SKIP) != > EMULATE_DONE) > printk(KERN_DEBUG "%s: NOP\n", __func__); > return; > } > > Since the printk says NOP, I thought emulate_instruction was doing so... > > The reason I asked about next_rip is because I was hoping to use this > entry to advance rip only after userspace executed I/O done by "out", > like if next_rip is !0, > call kvm_rip_write(), and introduce next_rip to vmx if it is usable > because vmx is > currently using local variable rip. > > Yoshi -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html