On 11/15/2012 04:08 AM, Marcelo Tosatti wrote: > We want to expose the pvclock shared memory areas, which > the hypervisor periodically updates, to userspace. > > For a linear mapping from userspace, it is necessary that > entire page sized regions are used for array of pvclock > structures. > > There is no such guarantee with per cpu areas, therefore move > to memblock_alloc based allocation. Ok. > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> For the concept: Acked-by: Glauber Costa <glommer@xxxxxxxxxxxxx> I do have one comment. > > Index: vsyscall/arch/x86/kernel/kvmclock.c > =================================================================== > --- vsyscall.orig/arch/x86/kernel/kvmclock.c > +++ vsyscall/arch/x86/kernel/kvmclock.c > @@ -23,6 +23,7 @@ > #include <asm/apic.h> > #include <linux/percpu.h> > #include <linux/hardirq.h> > +#include <linux/memblock.h> > > #include <asm/x86_init.h> > #include <asm/reboot.h> > @@ -39,7 +40,11 @@ static int parse_no_kvmclock(char *arg) > early_param("no-kvmclock", parse_no_kvmclock); > > /* The hypervisor will put information about time periodically here */ > -static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock); > +struct pvclock_aligned_vcpu_time_info { > + struct pvclock_vcpu_time_info clock; > +} __attribute__((__aligned__(SMP_CACHE_BYTES))); > + > +static struct pvclock_aligned_vcpu_time_info *hv_clock; > static struct pvclock_wall_clock wall_clock; > > /* > @@ -52,15 +57,20 @@ static unsigned long kvm_get_wallclock(v > struct pvclock_vcpu_time_info *vcpu_time; > struct timespec ts; > int low, high; > + int cpu; > + > + preempt_disable(); > + cpu = smp_processor_id(); > > low = (int)__pa_symbol(&wall_clock); > high = ((u64)__pa_symbol(&wall_clock) >> 32); > > native_write_msr(msr_kvm_wall_clock, low, high); > > - vcpu_time = &get_cpu_var(hv_clock); > + vcpu_time = &hv_clock[cpu].clock; You are leaving preempt disabled for a lot longer than in should be. In particular, you'll have a round trip to the hypervisor in the middle. It doesn't matter *that* much because wallclock is mostly a one-time thing, so I won't oppose in this basis. But if you have the chance, either fix it, or tell us why you need preemption on for longer than it was before. > void __init kvmclock_init(void) > { > + unsigned long mem; > + > if (!kvm_para_available()) > return; > > + mem = memblock_alloc(sizeof(struct pvclock_aligned_vcpu_time_info) * NR_CPUS, > + PAGE_SIZE); > + if (!mem) > + return; > + hv_clock = __va(mem); > + > if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) { > msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW; > msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW; > > If you don't have kvmclock enabled, which the next line in here would test for, you will exit this function with allocated memory. How about the following? if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) { msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW; msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW; } else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE))) return; mem = memblock_alloc(sizeof(struct pvclock_aligned_vcpu_time_info) * NR_CPUS, PAGE_SIZE); if (!mem) return; hv_clock = __va(mem); printk(KERN_INFO "kvm-clock: Using msrs %x and %x", msr_kvm_system_time, msr_kvm_wall_clock); if (kvm_register_clock("boot clock")) { memblock_free(); return; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html