On Thu, Jun 28, 2012 at 09:10:39AM -0500, Anthony Liguori wrote: > On 06/26/2012 02:34 PM, Marcelo Tosatti wrote: > >On Sat, Jun 23, 2012 at 12:55:49AM +0200, Jan Kiszka wrote: > >>Should have declared this [RFC] in the subject and CC'ed kvm... > >> > >>On 2012-06-23 00:45, Jan Kiszka wrote: > >>>This sketches a possible path to get rid of the iothread lock on vmexits > >>>in KVM mode. On x86, the the in-kernel irqchips has to be used because > >>>we otherwise need to synchronize APIC and other per-cpu state accesses > >>>that could be changed concurrently. Not yet fully analyzed is the NMI > >>>injection path in the absence of an APIC. > >>> > >>>s390x should be fine without specific locking as their pre/post-run > >>>callbacks are empty. Power requires locking for the pre-run callback. > >>> > >>>This patch is untested, but a similar version was successfully used in > >>>a x86 setup with a network I/O path that needed no central iothread > >>>locking anymore (required special MMIO exit handling). > >>>--- > >>> kvm-all.c | 18 ++++++++++++++++-- > >>> target-i386/kvm.c | 7 +++++++ > >>> target-ppc/kvm.c | 4 ++++ > >>> 3 files changed, 27 insertions(+), 2 deletions(-) > >>> > >>>diff --git a/kvm-all.c b/kvm-all.c > >>>index f8e4328..9c3e26f 100644 > >>>--- a/kvm-all.c > >>>+++ b/kvm-all.c > >>>@@ -1460,6 +1460,8 @@ int kvm_cpu_exec(CPUArchState *env) > >>> return EXCP_HLT; > >>> } > >>> > >>>+ qemu_mutex_unlock_iothread(); > >>>+ > >>> do { > >>> if (env->kvm_vcpu_dirty) { > >>> kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE); > >>>@@ -1476,14 +1478,16 @@ int kvm_cpu_exec(CPUArchState *env) > >>> */ > >>> qemu_cpu_kick_self(); > >>> } > >>>- qemu_mutex_unlock_iothread(); > >>> > >>> run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0); > >>> > >>>- qemu_mutex_lock_iothread(); > >>> kvm_arch_post_run(env, run); > >>> > >>>+ /* TODO: push coalesced mmio flushing to the point where we access > >>>+ * devices that are using it (currently VGA and E1000). */ > >>>+ qemu_mutex_lock_iothread(); > >>> kvm_flush_coalesced_mmio_buffer(); > >>>+ qemu_mutex_unlock_iothread(); > >>> > >>> if (run_ret< 0) { > >>> if (run_ret == -EINTR || run_ret == -EAGAIN) { > >>>@@ -1499,19 +1503,23 @@ int kvm_cpu_exec(CPUArchState *env) > >>> switch (run->exit_reason) { > >>> case KVM_EXIT_IO: > >>> DPRINTF("handle_io\n"); > >>>+ qemu_mutex_lock_iothread(); > >>> kvm_handle_io(run->io.port, > >>> (uint8_t *)run + run->io.data_offset, > >>> run->io.direction, > >>> run->io.size, > >>> run->io.count); > >>>+ qemu_mutex_unlock_iothread(); > >>> ret = 0; > >>> break; > >>> case KVM_EXIT_MMIO: > >>> DPRINTF("handle_mmio\n"); > >>>+ qemu_mutex_lock_iothread(); > >>> cpu_physical_memory_rw(run->mmio.phys_addr, > >>> run->mmio.data, > >>> run->mmio.len, > >>> run->mmio.is_write); > >>>+ qemu_mutex_unlock_iothread(); > >>> ret = 0; > >>> break; > >>> case KVM_EXIT_IRQ_WINDOW_OPEN: > >>>@@ -1520,7 +1528,9 @@ int kvm_cpu_exec(CPUArchState *env) > >>> break; > >>> case KVM_EXIT_SHUTDOWN: > >>> DPRINTF("shutdown\n"); > >>>+ qemu_mutex_lock_iothread(); > >>> qemu_system_reset_request(); > >>>+ qemu_mutex_unlock_iothread(); > >>> ret = EXCP_INTERRUPT; > >>> break; > >>> case KVM_EXIT_UNKNOWN: > >>>@@ -1533,11 +1543,15 @@ int kvm_cpu_exec(CPUArchState *env) > >>> break; > >>> default: > >>> DPRINTF("kvm_arch_handle_exit\n"); > >>>+ qemu_mutex_lock_iothread(); > >>> ret = kvm_arch_handle_exit(env, run); > >>>+ qemu_mutex_unlock_iothread(); > >>> break; > >>> } > >>> } while (ret == 0); > >>> > >>>+ qemu_mutex_lock_iothread(); > >>>+ > >>> if (ret< 0) { > >>> cpu_dump_state(env, stderr, fprintf, CPU_DUMP_CODE); > >>> vm_stop(RUN_STATE_INTERNAL_ERROR); > >>>diff --git a/target-i386/kvm.c b/target-i386/kvm.c > >>>index 0d0d8f6..0ad64d1 100644 > >>>--- a/target-i386/kvm.c > >>>+++ b/target-i386/kvm.c > >>>@@ -1631,7 +1631,10 @@ void kvm_arch_pre_run(CPUX86State *env, struct kvm_run *run) > >>> > >>> /* Inject NMI */ > >>> if (env->interrupt_request& CPU_INTERRUPT_NMI) { > >>>+ qemu_mutex_lock_iothread(); > >>> env->interrupt_request&= ~CPU_INTERRUPT_NMI; > >>>+ qemu_mutex_unlock_iothread(); > >>>+ > >>> DPRINTF("injected NMI\n"); > >>> ret = kvm_vcpu_ioctl(env, KVM_NMI); > >>> if (ret< 0) { > >>>@@ -1641,6 +1644,8 @@ void kvm_arch_pre_run(CPUX86State *env, struct kvm_run *run) > >>> } > >>> > >>> if (!kvm_irqchip_in_kernel()) { > >>>+ qemu_mutex_lock_iothread(); > >>>+ > >>> /* Force the VCPU out of its inner loop to process any INIT requests > >>> * or pending TPR access reports. */ > >>> if (env->interrupt_request& > >>>@@ -1682,6 +1687,8 @@ void kvm_arch_pre_run(CPUX86State *env, struct kvm_run *run) > >>> > >>> DPRINTF("setting tpr\n"); > >>> run->cr8 = cpu_get_apic_tpr(env->apic_state); > >>>+ > >>>+ qemu_mutex_unlock_iothread(); > >>> } > >>> } > >>> > >>>diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > >>>index c09cc39..60d91a5 100644 > >>>--- a/target-ppc/kvm.c > >>>+++ b/target-ppc/kvm.c > >>>@@ -471,6 +471,8 @@ void kvm_arch_pre_run(CPUPPCState *env, struct kvm_run *run) > >>> int r; > >>> unsigned irq; > >>> > >>>+ qemu_mutex_lock_iothread(); > >>>+ > >>> /* PowerPC QEMU tracks the various core input pins (interrupt, critical > >>> * interrupt, reset, etc) in PPC-specific env->irq_input_state. */ > >>> if (!cap_interrupt_level&& > >>>@@ -497,6 +499,8 @@ void kvm_arch_pre_run(CPUPPCState *env, struct kvm_run *run) > >>> /* We don't know if there are more interrupts pending after this. However, > >>> * the guest will return to userspace in the course of handling this one > >>> * anyways, so we will get a chance to deliver the rest. */ > >>>+ > >>>+ qemu_mutex_unlock_iothread(); > >>> } > >>> > >>> void kvm_arch_post_run(CPUPPCState *env, struct kvm_run *run) > >>> > > > >The following plan would allow progressive convertion to parallel > >operation. > > > >Jan mentioned the MMIO handler->MMIO handler deadlock in a private message. > > > >Jan: if there is recursive MMIO accesses, you can detect that and skip > >such MMIO handlers in dev_can_use_lock() ? Or blacklist. > > > >What i mentioned earlier about "unsolved issues" are the possibilities > >in "iothread flow" (belief is that it would be better determined at > >with execution experience / tuning). > > > >OK, so, each document starts with xyz.txt. This is compatible with TCG, > >the critical part is not dropping/acquire locks for decent performance > >(that is, only drop memmap_lock from TCG VCPUS thread if IOTHREAD > >requests so). > > > >This came out from discussions with Avi. > > > >immediate-plan.txt > >------------------ > > > >1. read_lock(memmap_lock) > >2. MemoryRegionSection mrs = lookup(addr) > >3. qom_ref(mrs.mr->dev) > >4. read_unlock(memmap_lock) > > > >5. mutex_lock(dev->lock) > >6. dispatch(&mrs, addr, data, size) > >7. mutex_unlock(dev->lock) > > Just a detail, I don't think we should acquire a device specific > lock in global code. Why? The above outdated, it should be: 5. lock_device(dev) 6. dispatch() 7. unlock_device(dev) See lock_device below. > Rather, I think we should acquire the global > lock before dispatch unless a MemoryRegion is marked as being > unlocked. See lock_device and dev_can_use_lock below. The per-device lock is used if all services used by the dispatch handler (such as timers, interrupt, network data transmission, etc), are prepared (converted) to skip entering device protected region, when executed from non-vcpu contexes, such as: main_loop: select() mutex_lock(qemu_global_mutex) iohandler() if (mutex_trylock(dev->lock) == SUCCESS) push_data_to(dev) else continue processing of -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html