Hi Marc, On 7/6/20 1:17 PM, Marc Zyngier wrote: > On 2020-06-25 13:19, Alexandru Elisei wrote: >> Hi Marc, >> >> On 6/16/20 5:18 PM, Marc Zyngier wrote: >>> Hi Alexandru, >>> [..] >>>>> [..] >>>>> >>>>> /** >>>>> - * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation. >>>>> - * @kvm: The KVM struct pointer for the VM. >>>>> + * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure >>>>> + * @kvm: The pointer to the KVM structure >>>>> + * @mmu: The pointer to the s2 MMU structure >>>>> * >>>>> * Allocates only the stage-2 HW PGD level table(s) of size defined by >>>>> - * stage2_pgd_size(kvm). >>>>> + * stage2_pgd_size(mmu->kvm). >>>>> * >>>>> * Note we don't need locking here as this is only called when the VM is >>>>> * created, which can only be done once. >>>>> */ >>>>> -int kvm_alloc_stage2_pgd(struct kvm *kvm) >>>>> +int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu) >>>>> { >>>>> phys_addr_t pgd_phys; >>>>> pgd_t *pgd; >>>>> + int cpu; >>>>> >>>>> - if (kvm->arch.pgd != NULL) { >>>>> + if (mmu->pgd != NULL) { >>>>> kvm_err("kvm_arch already initialized?\n"); >>>>> return -EINVAL; >>>>> } >>>>> @@ -1024,8 +1040,20 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm) >>>>> if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm))) >>>>> return -EINVAL; >>>> >>>> We don't free the pgd if we get the error above, but we do free it below, if >>>> allocating last_vcpu_ran fails. Shouldn't we free it in both cases? >>> >>> Worth investigating. This code gets majorly revamped in the NV series, so it is >>> likely that I missed something in the middle. >> >> You didn't miss anything, I checked and it's the same in the upstream >> version of KVM. >> >> kvm_arch_init_vm() returns with an error if this functions fails, so it's up to >> the function to do the clean up. kvm_alloc_pages_exact() returns NULL >> on error, so >> at this point we have a valid allocation of physical contiguous pages. >> Failing to >> create a VM is not a fatal error for the system, so I'm thinking that maybe we >> should free those pages for the rest of the system to use. However, this is a >> minor issue, and the patch isn't supposed to make any functional changes, so it >> can be probably be left for another patch and not add more to an >> already big series. > > Cool. Will you be posting such patch? I was considering one, but then I realized there's something that I still don't understand. alloc_pages_exact() allocates 2^order pages (where 2^order pages >= stage2_pgd_size()) via __get_free_pages -> alloc_pages(), then it frees the unneeded pages at the *end* of the allocation in make_alloc_exact(). So the beginning of the allocated physical area remains aligned to 2^order pages, and implicitly to stage2_pgd_size(). But now I can't figure out why kvm_vttbr_baddr_mask() isn't simply defined as ~stage2_pgd_size(). Is it possible for kvm_vttbr_baddr_mask() to return an alignment larger than the size of the table? I can't seem to find anything to that effect in ARM arm (but that doesn't mean that I didn't miss anything). Thanks, Alex