On 9/10/18 7:27 AM, Borislav Petkov wrote: > On Fri, Sep 07, 2018 at 12:57:30PM -0500, Brijesh Singh wrote: >> Currently, the per-cpu pvclock data is allocated dynamically when >> cpu > HVC_BOOT_ARRAY_SIZE. > Well no, you need to write this correctly - what is "cpu > > HVC_BOOT_ARRAY_SIZE" ?! > > ( I know what it is but I know it only because I've looked at that code before. ) > > So no, please explain it in English not in code. >> The physical address of this variable is >> shared between the guest and the hypervisor hence it must be mapped as >> unencrypted (ie. C=0) when SEV is active. > This sentence is a good example about how to explain stuff in commit > messages. > >> The C-bit works on a page, > "The C-bit determines the encryption status of a 4K page." > >> hence we will be required to perform a > Use passive tone in your commit message: no "we", etc... > >> full 4k page allocation to store a single 32-byte pvclock variable. It >> will waste fairly sizeable amount of memory since each CPU will be doing > "... will waste *a* fairly sizeable amount of ..." > >> a separate 4k allocation. > Start new paragraph here and use passive tone. > >> Let's define a second array for the SEV case to >> statically allocate for NR_CPUS and put this array in .data..decrypted > NR_CPUS needs explaining for the unenlightened reader. Also, > > "... put this array in *the* .data..decrypted section... " > >> section so that its mapped with C=0 during boot. > <---- newline here. > >> The .data..decrypted >> section has a big chunk of memory that is currently unused. And since >> second array will be used only when memory encryption is active hence > "... since *the* second array... " > > s/hence // > >> free it when encryption is not active. >> >> Signed-off-by: Brijesh Singh <brijesh.singh@xxxxxxx> >> Suggested-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> >> Cc: Tom Lendacky <thomas.lendacky@xxxxxxx> >> Cc: kvm@xxxxxxxxxxxxxxx >> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> Cc: Borislav Petkov <bp@xxxxxxx> >> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> >> Cc: linux-kernel@xxxxxxxxxxxxxxx >> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> >> Cc: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> >> Cc: kvm@xxxxxxxxxxxxxxx >> Cc: "Radim Krčmář" <rkrcmar@xxxxxxxxxx> >> --- >> arch/x86/include/asm/mem_encrypt.h | 4 ++++ >> arch/x86/kernel/kvmclock.c | 14 ++++++++++++++ >> arch/x86/kernel/vmlinux.lds.S | 3 +++ >> arch/x86/mm/init.c | 3 +++ >> arch/x86/mm/mem_encrypt.c | 10 ++++++++++ >> 5 files changed, 34 insertions(+) >> >> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h >> index 802b2eb..cc46584 100644 >> --- a/arch/x86/include/asm/mem_encrypt.h >> +++ b/arch/x86/include/asm/mem_encrypt.h >> @@ -48,11 +48,13 @@ int __init early_set_memory_encrypted(unsigned long vaddr, unsigned long size); >> >> /* Architecture __weak replacement functions */ >> void __init mem_encrypt_init(void); >> +void __init free_decrypted_mem(void); > Proper prefixing: > > "mem_encrypt_free_decrypted" > > or so > >> bool sme_active(void); >> bool sev_active(void); >> >> #define __decrypted __attribute__((__section__(".data..decrypted"))) >> +#define __decrypted_aux __attribute__((__section__(".data..decrypted.aux"))) >> >> #else /* !CONFIG_AMD_MEM_ENCRYPT */ >> >> @@ -80,6 +82,7 @@ static inline int __init >> early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; } >> >> #define __decrypted >> +#define __decrypted_aux >> >> #endif /* CONFIG_AMD_MEM_ENCRYPT */ >> >> @@ -93,6 +96,7 @@ early_set_memory_encrypted(unsigned long vaddr, unsigned long size) { return 0; >> #define __sme_pa_nodebug(x) (__pa_nodebug(x) | sme_me_mask) >> >> extern char __start_data_decrypted[], __end_data_decrypted[]; >> +extern char __start_data_decrypted_aux[]; >> >> #endif /* __ASSEMBLY__ */ >> >> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c >> index 376fd3a..6086b56 100644 >> --- a/arch/x86/kernel/kvmclock.c >> +++ b/arch/x86/kernel/kvmclock.c >> @@ -65,6 +65,15 @@ static struct pvclock_vsyscall_time_info >> static struct pvclock_wall_clock wall_clock __decrypted; >> static DEFINE_PER_CPU(struct pvclock_vsyscall_time_info *, hv_clock_per_cpu); >> >> +#ifdef CONFIG_AMD_MEM_ENCRYPT >> +/* >> + * The auxiliary array will be used when SEV is active. In non-SEV case, >> + * it will be freed by free_decrypted_mem(). >> + */ >> +static struct pvclock_vsyscall_time_info >> + hv_clock_aux[NR_CPUS] __decrypted_aux; > Hmm, so worst case that's 64 4K pages: > > (8192*32)/4096 = 64 4K pages. We can minimize the worst case memory usage. The number of VCPUs supported by KVM maybe less than NR_CPUS. e.g Currently KVM_MAX_VCPUS is set to 288 (288 * 64)/4096 = 4 4K pages. (pvclock_vsyscall_time_info is cache aligned so it will be 64 bytes) #if NR_CPUS > KVM_MAX_VCPUS #define HV_AUX_ARRAY_SIZE KVM_MAX_VCPUS #else #define HV_AUX_ARRAY_SIZE NR_CPUS #endif static struct pvclock_vsyscall_time_info hv_clock_aux[HV_AUX_ARRAY_SIZE] __decrypted_aux; > Now, the real question from all this SNAFU is, why can't all those point > to a single struct pvclock_vsyscall_time_info and all CPUs read a single > thing? Why do they have to be per-CPU and thus waste so much memory? >