On Thu, Feb 9, 2023 at 2:48 PM Gavin Shan <gshan@xxxxxxxxxx> wrote: > > On 2/10/23 3:17 AM, Ricardo Koller wrote: > > "(> > > + if (data->mc_capacity < nr_pages) > >>>> + return -ENOMEM; > >>>> + > >>>> + phys = kvm_pte_to_phys(pte); > >>>> + prot = kvm_pgtable_stage2_pte_prot(pte); > >>>> + > >>>> + ret = kvm_pgtable_stage2_create_unlinked(data->mmu->pgt, &new, phys, > >>>> + level, prot, mc, force_pte); > >>>> + if (ret) > >>>> + return ret; > >>>> + > >>>> + if (!stage2_try_break_pte(ctx, data->mmu)) { > >>>> + childp = kvm_pte_follow(new, mm_ops); > >>>> + kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level); > >>>> + mm_ops->put_page(childp); > >>>> + return -EAGAIN; > >>>> + } > >>>> + > >>>> + /* > >>>> + * Note, the contents of the page table are guaranteed to be made > >>>> + * visible before the new PTE is assigned because stage2_make_pte() > >>>> + * writes the PTE using smp_store_release(). > >>>> + */ > >>>> + stage2_make_pte(ctx, new); > >>>> + dsb(ishst); > >>>> + data->mc_capacity -= nr_pages; > >>>> + return 0; > >>>> +} > >>>> + > >>> > >>> I think it's possible 'data->mc_capability' to be replaced by 'mc->nobjs' > >>> because they're same thing. With this, we needn't to maintain a duplicate > >>> 'data->mc_capability' since 'data->mc' has been existing. > >> > >> Ah, nice, yes. That would be simpler. > >> > > > > Actually, there's a complication. The memcache details are hidden > > inside of pgtable.c, > > so different types of memcaches (for vhe and nvhe) can be used for allocations. > > Specifically, the memcache objects are passed as an opaque pointer ("void *") > > and can be used with "struct hyp_pool" and "struct kvm_mmu_memory_cache". > > > > So, here are all the options that I can think of: > > > > 1. stage2_split_walker() is just used on the VHE case with the > > "struct kvm_mmu_memory_cache" memcache, so we could just use it > > instead of a "void *": > > > > kvm_pgtable_stage2_split(..., struct kvm_mmu_memory_cache *mc); > > > > However, it could be used for the NVHE case as well, plus > > this would go against the overall design of pgtable.c which tries > > to use opaque objects for most things. > > > > 2. add a "get_nobjs()" method to both memcaches. This is tricky > > because "struct hyp_pool" doesn't directly track its capacity. I > > would rather not mess with it. > > > > 3. This whole accounting of available pages in the memcache is > > needed because of the way I implemented stage2_split_walker() and > > the memcache interface. stage2_split_walker() tries to allocate > > as many pages for the new table as allowed by the capacity of the > > memcache. The issue with blindingly trying until the allocation > > fails is that kvm_mmu_memory_cache_alloc() WARNs and tries to > > allocate using GFP_ATOMIC when !nobjs. We don't want to do that, > > so we could extend kvm_pgtable_mm_ops.zalloc_page() with a > > NO_GFP_ATOMIC_ON_EMPTY (or similar). This flag would have to be > > ignored on the hyp side. > > > > 4. what this patch is currently doing: tracking the capacity by > > hand. > > > > I prefer options 4 and 3. WDYT? > > > > Yeah, stage2_split_walker() is currently only used by VHE and it calls to > stage2_map_walker() to create the unlinked page table, which is shared by > VHE and nVHE. I think option 3 would be better than 4 because we generally > just want to fetch the pre-allocated page table, instead of allocating a > new page table with GFP_ATOMIC. However, I'm also fine with option 4. I Going with option 4 then for version 3. > think this can be improved in the future if you agree. Definitely! Thanks, Ricardo > > >> > >>> > >>>> +int kvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size, > >>>> + void *mc, u64 mc_capacity) > >>>> +{ > >>>> + struct stage2_split_data split_data = { > >>>> + .mmu = pgt->mmu, > >>>> + .memcache = mc, > >>>> + .mc_capacity = mc_capacity, > >>>> + }; > >>>> + > >>>> + struct kvm_pgtable_walker walker = { > >>>> + .cb = stage2_split_walker, > >>>> + .flags = KVM_PGTABLE_WALK_LEAF, > >>>> + .arg = &split_data, > >>>> + }; > >>>> + > >>>> + return kvm_pgtable_walk(pgt, addr, size, &walker); > >>>> +} > >>>> + > >>>> int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, > >>>> struct kvm_pgtable_mm_ops *mm_ops, > >>>> enum kvm_pgtable_stage2_flags flags, > >>>> > >>> > > Thanks, > Gavin >