On Mon, Dec 12, 2011 at 9:30 AM, Avi Kivity <avi@xxxxxxxxxx> wrote: > On 12/11/2011 12:25 PM, Christoffer Dall wrote: >> In order to support KVM on a SMP host, it is necessary to initialize the >> hypervisor on all CPUs, mostly by making sure each CPU gets its own >> hypervisor stack and runs the HYP init code. >> >> We also take care of some missing locking of modifications to the >> hypervisor page tables and ensure synchronized consistency between >> virtual IRQ masks and wait_for_interrupt flags on the VPUs. >> >> Note that this code doesn't handle CPU hotplug yet. >> Note that this code doesn't support SMP guests. >> >> WARNING: This code is in development and guests do not fully boot on SMP >> hosts yet. > > Damn, I just reviewed all that breakage. > so sorry..., >> >> /* Misc. fields */ >> + spinlock_t irq_lock; >> + u32 virt_irq; /* HCR exception mask */ >> u32 wait_for_interrupts; > > Better to use atomics, IMO. hmm, yeah, I guess the way to do it would be to have two fields - one atomic field used for interrupt injection, which is read atomically in the C-code into a plain u32 variable, which can then be copied directly onto the hardware during the world-switch... > >> @@ -464,13 +466,27 @@ static int kvm_arch_vm_ioctl_irq_line(struct kvm *kvm, >> >> trace_kvm_irq_line(irq_level->irq % 2, irq_level->level, vcpu_idx); >> >> + spin_lock(&vcpu->arch.irq_lock); >> if (irq_level->level) { >> vcpu->arch.virt_irq |= mask; >> + >> + /* >> + * Note that we grab the wq.lock before clearing the wfi flag >> + * since this ensures that a concurrent call to kvm_vcpu_block >> + * will either sleep before we grab the lock, in which case we >> + * wake it up, or will never sleep due to >> + * kvm_arch_vcpu_runnable being true (iow. this avoids having >> + * to grab the irq_lock in kvm_arch_vcpu_runnable). >> + */ >> + spin_lock(&vcpu->wq.lock); >> vcpu->arch.wait_for_interrupts = 0; >> + >> if (waitqueue_active(&vcpu->wq)) >> - wake_up_interruptible(&vcpu->wq); >> + __wake_up_locked(&vcpu->wq, TASK_INTERRUPTIBLE); >> + spin_unlock(&vcpu->wq.lock); >> } else >> vcpu->arch.virt_irq &= ~mask; >> + spin_unlock(&vcpu->arch.irq_lock); > > This looks overly complicated with two levels of locks. x86 gets by > with no locks, and a much more complicated interrupt architecture. > > My recommendation is: > wait_for_interrupts is managed solely by the vcpu thread > KVM_IRQ_LINE does a set_bit(, virt_irq) for the appropriate irq type, > then IPI/wakeups the vcpu to make it examine both wait_for_interrupts > and virt_irq. > > this sounds pretty good to me. something like this: if (irq_level->level) { set_bit(&vcpu->arch.irq_lines, bit_nr); smp_mb(); wake_up_interruptible(&vcpu->wq); } else clear_bit(&vcpu->arch.irq_lines, bit_nr); and the vcpu thread would clear the wait_for_interrupts flag if it ever sees the mask field be non-zero? >> + >> +static void cpu_init_hyp_mode(void *vector) >> +{ >> + unsigned long hyp_stack_ptr; >> + void *stack_page; >> + >> + stack_page = __get_cpu_var(kvm_arm_hyp_stack_page); >> + hyp_stack_ptr = (unsigned long)stack_page + PAGE_SIZE; >> + >> + cpu_set_vector(vector); >> + >> + /* >> + * Call initialization code >> + */ >> + asm volatile ( >> + "mov r0, %[pgd_ptr]\n\t" >> + "mov r1, %[stack_ptr]\n\t" >> + "hvc #0\n\t" : : >> + [pgd_ptr] "r" (virt_to_phys(kvm_hyp_pgd)), >> + [stack_ptr] "r" (hyp_stack_ptr) : >> + "r0", "r1"); >> +} > > (slightly nicer is to allocate hyp_stack_ptr and pgd_ptr to "register > asm("r0")" and "register asm("r1")" to avoid the extra mov instruction) > I agree >> @@ -522,47 +573,42 @@ static int init_hyp_mode(void) >> return -ENOMEM; >> >> /* >> - * Allocate stack page for Hypervisor-mode >> + * Allocate stack pages for Hypervisor-mode >> */ >> - kvm_arm_hyp_stack_page = (void *)__get_free_page(GFP_KERNEL); >> - if (!kvm_arm_hyp_stack_page) { >> - err = -ENOMEM; >> - goto out_free_pgd; >> - } >> + for_each_possible_cpu(cpu) { >> + void *stack_page; >> >> - hyp_stack_ptr = (unsigned long)kvm_arm_hyp_stack_page + PAGE_SIZE; >> + stack_page = (void *)__get_free_page(GFP_KERNEL); > > Best to allocate this (and other per-cpu state) on the cpu's node. > why, for performance reasons? The code get slightly more complicated, since we have to pass the return value through the argument so we have to pass an opaque pointer to the struct or something like that. >> + if (!stack_page) { >> + err = -ENOMEM; >> + goto out_free_pgd; >> + } >> + >> + per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page; >> + } >> >> init_phys_addr = virt_to_phys(__kvm_hyp_init); >> init_end_phys_addr = virt_to_phys(__kvm_hyp_init_end); >> + BUG_ON(init_phys_addr & 0x1f); >> >> /* >> - * Create identity mapping >> + * Create identity mapping for the init code. >> */ >> hyp_identity_mapping_add(kvm_hyp_pgd, >> (unsigned long)init_phys_addr, >> (unsigned long)init_end_phys_addr); >> >> + for_each_online_cpu(cpu) { >> + smp_call_function_single(cpu, cpu_init_hyp_mode, >> + (void *)(long)init_phys_addr, 1); >> + } > > Need similar code for cpu hotplug. See kvm_cpu_hotplug() and > kvm_arch_hardware_enable() which do all this for you. > so just to be sure, this will only be called for cpus that are hotplugged right? we still call the cpu_init_hyp_mode for each cpu that's online at this point. Do we need some locking to make sure the two don't overlap (like should I grab the kvm_lock here)? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html