Re: [patch 02/18] x86: kvmclock: allocate pvclock shared memory area

Glauber Costa <glommer@xxxxxxxxxxxxx> · Thu, 15 Nov 2012 21:05:55 +0400

On 11/15/2012 04:08 AM, Marcelo Tosatti wrote:
> We want to expose the pvclock shared memory areas, which 
> the hypervisor periodically updates, to userspace.
> 
> For a linear mapping from userspace, it is necessary that
> entire page sized regions are used for array of pvclock 
> structures.
> 
> There is no such guarantee with per cpu areas, therefore move
> to memblock_alloc based allocation.
Ok.

> 
> Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
For the concept:

Acked-by: Glauber Costa <glommer@xxxxxxxxxxxxx>

I do have one comment.

> 
> Index: vsyscall/arch/x86/kernel/kvmclock.c
> ===================================================================
> --- vsyscall.orig/arch/x86/kernel/kvmclock.c
> +++ vsyscall/arch/x86/kernel/kvmclock.c
> @@ -23,6 +23,7 @@
>  #include <asm/apic.h>
>  #include <linux/percpu.h>
>  #include <linux/hardirq.h>
> +#include <linux/memblock.h>
>  
>  #include <asm/x86_init.h>
>  #include <asm/reboot.h>
> @@ -39,7 +40,11 @@ static int parse_no_kvmclock(char *arg)
>  early_param("no-kvmclock", parse_no_kvmclock);
>  
>  /* The hypervisor will put information about time periodically here */
> -static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, hv_clock);
> +struct pvclock_aligned_vcpu_time_info {
> +	struct pvclock_vcpu_time_info clock;
> +} __attribute__((__aligned__(SMP_CACHE_BYTES)));
> +
> +static struct pvclock_aligned_vcpu_time_info *hv_clock;
>  static struct pvclock_wall_clock wall_clock;
>  
>  /*
> @@ -52,15 +57,20 @@ static unsigned long kvm_get_wallclock(v
>  	struct pvclock_vcpu_time_info *vcpu_time;
>  	struct timespec ts;
>  	int low, high;
> +	int cpu;
> +
> +	preempt_disable();
> +	cpu = smp_processor_id();
>  
>  	low = (int)__pa_symbol(&wall_clock);
>  	high = ((u64)__pa_symbol(&wall_clock) >> 32);
>  
>  	native_write_msr(msr_kvm_wall_clock, low, high);
>  
> -	vcpu_time = &get_cpu_var(hv_clock);
> +	vcpu_time = &hv_clock[cpu].clock;

You are leaving preempt disabled for a lot longer than in should be. In
particular, you'll have a round trip to the hypervisor in the middle. It
doesn't matter *that* much because wallclock is mostly a one-time thing,
so I won't oppose in this basis.

But if you have the chance, either fix it, or tell us why you need
preemption on for longer than it was before.

>  void __init kvmclock_init(void)
>  {
> +	unsigned long mem;
> +
>  	if (!kvm_para_available())
>  		return;
>  
> +	mem = memblock_alloc(sizeof(struct pvclock_aligned_vcpu_time_info) * NR_CPUS,
> +			     PAGE_SIZE);
> +	if (!mem)
> +		return;
> +	hv_clock = __va(mem);
> +
>  	if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
>  		msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
>  		msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
> 
>
If you don't have kvmclock enabled, which the next line in here would
test for, you will exit this function with allocated memory. How about
the following?

if (kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
    msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
    msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
    return;

mem = memblock_alloc(sizeof(struct pvclock_aligned_vcpu_time_info)
                     * NR_CPUS, PAGE_SIZE);
if (!mem)
    return;
hv_clock = __va(mem);

printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
       msr_kvm_system_time, msr_kvm_wall_clock);

if (kvm_register_clock("boot clock")) {
    memblock_free();
    return;
}

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html