2010/4/8 Gleb Natapov <gleb@xxxxxxxxxx>: > On Wed, Apr 07, 2010 at 03:25:10PM +0900, Yoshiaki Tamura wrote: >> 2010/4/6 Gleb Natapov <gleb@xxxxxxxxxx>: >> > On Tue, Apr 06, 2010 at 01:11:23PM +0900, Yoshiaki Tamura wrote: >> >> Hi. >> >> >> >> When handle_io() is called, rip is currently proceeded *before* actually having >> >> I/O handled by qemu in userland. Upon implementing Kemari for >> >> KVM(http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg25141.html) mainly in >> >> userland qemu, we encountered a problem that synchronizing the content of VCPU >> >> before handling I/O in qemu is too late because rip is already proceeded in KVM, >> >> Although we avoided this issue with temporal hack, I would like to ask a few >> >> question on skip_emulated_instructions. >> >> >> >> 1. Does rip need to be proceeded before having I/O handled by qemu? >> > In current kvm.git rip is proceeded before I/O is handled by qemu only >> > in case of "out" instruction. From architecture point of view I think >> > it's OK since on real HW you can't guaranty that I/O will take effect >> > before instruction pointer is advanced. It is done like that because we >> > want "out" emulation to be real fast so we skip x86 emulator. >> >> Thanks for your reply. >> >> If proceeding rip later doesn't break the behavior of devices or >> introduce slow down, I would like that to be done. >> > Device can not care less about what value rip register currently has. > Why is it matters for you code? My code, Kemari is a mechanism to synchronize VMs to achieve fault tolerance. It transfers the whole VM state upon events such as disk or network output, so that the secondary server can keep continuing upon hardware failure. Please think it like continuous live migration. I've implemented this feature in userland qemu, which calls the live migration function when it detects any outputs from the device emulators. http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg25022.html The problem here is that, I needed to transfer the VM state which is just *before* the output to the devices. Otherwise, the VM state has already been proceeded, and after failover, some I/O didn't work as I expected. I tracked down this issue, and figured out rip was already proceeded in KVM, and transferring this VCPU state was meaningless. I'm planning to post the patch set of Kemari soon, but I would like to solve this rip issue before that. If there is no drawback, I'm happy to work and post a patch. >> >> 2. If no, is it possible to divide skip_emulated_instructions(), like >> >> rec_emulated_instructions() to remember to next_rip, and >> >> skip_emulated_instructions() to actually proceed the rip. >> > Currently only emulator can call userspace to do I/O, so after >> > userspace returns after I/O exit, control is handled back to emulator >> > unconditionally. "out" instruction skips emulator, but there is nothing >> > to do after userspace returns, so regular cpu loop is executed. If we >> > want to advance rip only after userspace executed I/O done by "out" we >> > need to distinguish who requested I/O (emulator or kvm_fast_pio_out()) >> > and call different code depending on who that was. It can be done by >> > having a callback that (if not null) is called on return from userspace. >> >> Your suggestion is to introduce a callback entry, and instead of >> calling kvm_rip_write(), set it to the entry before calling >> kvm_fast_pio_out(), >> and check the entry upon return from the userspace, correct? >> > Something like that, yes. OK. Let me work on that. >> According to the comment in x86.c, when it was "out" instruction >> vcpu->arch.pio.count is set to 0 to skip the emulator. >> To call kvm_fast_pio_out(), "!string" and "!in" must be set. >> If we can check, vcpu->arch.pio.count, "string" and "in" on return >> from the userspace, can't we distinguish who requested I/O, emulator >> or kvm_fast_pio_out()? >> > May be, but callback approach is much cleaner. "string" and "in" can have > stale data for instance. I see. I was thinking that can be a trade off between introducing a new variable. I'll take the callback approach first, and think again later if necessary. > >> >> 3. svm has next_rip but when it is 0, nop is emulated. Can this be modified to >> >> continue without emulating nop when next_rip is 0? >> >> >> > I don't see where nop is emulated if next_rip is 0. As far as I see in >> > case of next_rip==0 an instruction at rip is decoded to figure out its >> > length and then rip is advanced by instruction length. Anyway next_rip >> > is svm thing only. >> >> Sorry. I wasn't understanding the code enough. >> >> static void skip_emulated_instruction(struct kvm_vcpu *vcpu) >> { >> ... >> if (!svm->next_rip) { >> if (emulate_instruction(vcpu, 0, 0, EMULTYPE_SKIP) != >> EMULATE_DONE) >> printk(KERN_DEBUG "%s: NOP\n", __func__); >> return; >> } >> >> Since the printk says NOP, I thought emulate_instruction was doing so... >> >> The reason I asked about next_rip is because I was hoping to use this >> entry to advance rip only after userspace executed I/O done by "out", >> like if next_rip is !0, >> call kvm_rip_write(), and introduce next_rip to vmx if it is usable >> because vmx is >> currently using local variable rip. >> >> Yoshi > > -- > Gleb. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html