Use an "unsigned long" instead of a "u64" to track the number of entries in a pte_list_desc's sptes array. Both sizes are overkill as the number of entries would easily fit into a u8, the goal is purely to get sptes[] aligned and to size the struct as a whole to be a multiple of a cache line (64 bytes). Using a u64 on 32-bit kernels fails on both accounts as "more" is only 4 bytes. Dropping "spte_count" to 4 bytes on 32-bit kernels fixes the alignment issue and the overall size. Add a compile-time assert to ensure the size of pte_list_desc stays a multiple of the cache line size on modern CPUs (hardcoded because L1_CACHE_BYTES is configurable via CONFIG_X86_L1_CACHE_SHIFT). Fixes: 13236e25ebab ("KVM: X86: Optimize pte_list_desc with per-array counter") Cc: Peter Xu <peterx@xxxxxxxxxx> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/mmu/mmu.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index bd74a287b54a..17ac30b9e22c 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -117,15 +117,17 @@ module_param(dbg, bool, 0644); /* * Slight optimization of cacheline layout, by putting `more' and `spte_count' * at the start; then accessing it will only use one single cacheline for - * either full (entries==PTE_LIST_EXT) case or entries<=6. + * either full (entries==PTE_LIST_EXT) case or entries<=6. On 32-bit kernels, + * the entire struct fits in a single cacheline. */ struct pte_list_desc { struct pte_list_desc *more; /* - * Stores number of entries stored in the pte_list_desc. No need to be - * u64 but just for easier alignment. When PTE_LIST_EXT, means full. + * The number of valid entries in sptes[]. Use an unsigned long to + * naturally align sptes[] (a u8 for the count would suffice). When + * equal to PTE_LIST_EXT, this particular list is full. */ - u64 spte_count; + unsigned long spte_count; u64 *sptes[PTE_LIST_EXT]; }; @@ -5640,6 +5642,9 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level, tdp_root_level = tdp_forced_root_level; max_tdp_level = tdp_max_root_level; + BUILD_BUG_ON_MSG((sizeof(struct pte_list_desc) % 64), + "pte_list_desc is not a multiple of cache line size (on modern CPUs)"); + /* * max_huge_page_level reflects KVM's MMU capabilities irrespective * of kernel support, e.g. KVM may be capable of using 1GB pages when -- 2.37.0.rc0.161.g10f37bed90-goog