Adding Tom and Boris On 7/6/18 12:47 PM, Paolo Bonzini wrote: > On 06/07/2018 18:13, Thomas Gleixner wrote: >> To allow early utilization of kvmclock it is required to remove the >> memblock dependency. memblock is currently used to allocate the per >> cpu data for kvmclock. >> >> The first patch replaces the memblock with a static array sized 64bytes * >> NR_CPUS and was posted by Pavel. That patch allocates everything statically >> which is a waste when kvmclock is not used. >> >> The rest of the series cleans up the code and converts it to per cpu >> variables but does not put the kvmclock data into the per cpu area as that >> has an issue vs. mapping the boot cpu data into the VDSO (leaks arbitrary >> data, unless page sized). >> >> The per cpu data consists of pointers to the actual data. For the boot cpu >> a page sized array is statically allocated which can be mapped into the >> VDSO. That array is used for initializing the first 64 CPU pointers. If >> there are more CPUs the pvclock data is allocated during CPU bringup. >> >> So this still will have some overhead when kvmclock is not in use, but >> bringing it down to zero would be a massive trainwreck and even more >> indirections. >> >> Thanks, >> >> tglx >> >> 8<-------------- >> a/arch/x86/include/asm/kvm_guest.h | 7 >> arch/x86/include/asm/kvm_para.h | 1 >> arch/x86/kernel/kvm.c | 14 - >> arch/x86/kernel/kvmclock.c | 262 ++++++++++++++----------------------- >> arch/x86/kernel/setup.c | 4 >> 5 files changed, 105 insertions(+), 183 deletions(-) >> >> >> >> > Thanks, this is really nice. With the small changes from my review, > > Acked-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> Hi Paolo and Thomas, This series breaks SEV guest support. The physical address of both wall_clock and hv_clock is shared with hypervisor for updates. In case of SEV the address must be mapped as 'decrypted (i.e C=0)' so that both guest and HV can access the data correctly. The follow patch should map the pages as decrypted. diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 890e9e5..640c796 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -251,6 +251,20 @@ static void kvm_shutdown(void) native_machine_shutdown(); } +static void sev_map_clocks_decrypted(void) +{ + if (!sev_active()) + return; + + /* + * wall_clock and hv_clock addresses are shared with hypervisor. + * When SEV is enabled, any addresses shared with hypervisor must be + * mapped decrypted. + */ + early_set_memory_decrypted((unsigned long) wall_clock, WALL_CLOCK_SIZE); + early_set_memory_decrypted((unsigned long) hv_clock, HV_CLOCK_SIZE); +} + void __init kvmclock_init(void) { struct pvclock_vcpu_time_info *vcpu_time; @@ -269,6 +283,8 @@ void __init kvmclock_init(void) wall_clock = (struct pvclock_wall_clock *)wall_clock_mem; hv_clock = (struct pvclock_vsyscall_time_info *)hv_clock_mem; + sev_map_clocks_decrypted(); + if (kvm_register_clock("primary cpu clock")) { hv_clock = NULL; wall_clock = NULL; But this patch triggers the below kernel crash. early_set_memory_decrypted() uses kernel_physical_mapping_init() to split the large pages and clear the C-bit. It seems this function still has dependency with memblock. [ 0.000000] Hypervisor detected: KVM [ 0.000000] Kernel panic - not syncing: alloc_low_pages: ran out of memory [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-rc3-sev #19 [ 0.000000] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 0.000000] Call Trace: [ 0.000000] ? dump_stack+0x5c/0x80 [ 0.000000] ? panic+0xe7/0x247 [ 0.000000] ? alloc_low_pages+0x130/0x130 [ 0.000000] ? kernel_physical_mapping_init+0xe0/0x204 [ 0.000000] ? early_set_memory_enc_dec+0x10f/0x160 [ 0.000000] ? 0xffffffffb1000000 [ 0.000000] ? kvmclock_init+0x83/0x20a [ 0.000000] ? setup_arch+0x42c/0xce6 [ 0.000000] ? start_kernel+0x67/0x531 [ 0.000000] ? load_ucode_bsp+0x76/0x12e [ 0.000000] ? secondary_startup_64+0xa5/0xb0 [ 0.000000] ---[ end Kernel panic - not syncing: alloc_low_pages: ran out of memory ]--- - Brijesh