Re: [PATCH v2 04/12] KVM: arm64: Add kvm_pgtable_stage2_split()

Ricardo Koller <ricarkol@xxxxxxxxxx> · Thu, 9 Feb 2023 08:17:31 -0800

"(> > > +     if (data->mc_capacity < nr_pages)
> > > +             return -ENOMEM;
> > > +
> > > +     phys = kvm_pte_to_phys(pte);
> > > +     prot = kvm_pgtable_stage2_pte_prot(pte);
> > > +
> > > +     ret = kvm_pgtable_stage2_create_unlinked(data->mmu->pgt, &new, phys,
> > > +                                              level, prot, mc, force_pte);
> > > +     if (ret)
> > > +             return ret;
> > > +
> > > +     if (!stage2_try_break_pte(ctx, data->mmu)) {
> > > +             childp = kvm_pte_follow(new, mm_ops);
> > > +             kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level);
> > > +             mm_ops->put_page(childp);
> > > +             return -EAGAIN;
> > > +     }
> > > +
> > > +     /*
> > > +      * Note, the contents of the page table are guaranteed to be made
> > > +      * visible before the new PTE is assigned because stage2_make_pte()
> > > +      * writes the PTE using smp_store_release().
> > > +      */
> > > +     stage2_make_pte(ctx, new);
> > > +     dsb(ishst);
> > > +     data->mc_capacity -= nr_pages;
> > > +     return 0;
> > > +}
> > > +
> >
> > I think it's possible 'data->mc_capability' to be replaced by 'mc->nobjs'
> > because they're same thing. With this, we needn't to maintain a duplicate
> > 'data->mc_capability' since 'data->mc' has been existing.
>
> Ah, nice, yes. That would be simpler.
>

Actually, there's a complication. The memcache details are hidden
inside of pgtable.c,
so different types of memcaches (for vhe and nvhe) can be used for allocations.
Specifically, the memcache objects are passed as an opaque pointer ("void *")
and can be used with "struct hyp_pool" and "struct kvm_mmu_memory_cache".

So, here are all the options that I can think of:

        1. stage2_split_walker() is just used on the VHE case with the
        "struct kvm_mmu_memory_cache" memcache, so we could just use it
        instead of a "void *":

                kvm_pgtable_stage2_split(..., struct kvm_mmu_memory_cache *mc);

        However, it could be used for the NVHE case as well, plus
        this would go against the overall design of pgtable.c which tries
        to use opaque objects for most things.

        2. add a "get_nobjs()" method to both memcaches. This is tricky
        because "struct hyp_pool" doesn't directly track its capacity. I
        would rather not mess with it.

        3. This whole accounting of available pages in the memcache is
        needed because of the way I implemented stage2_split_walker() and
        the memcache interface.  stage2_split_walker() tries to allocate
        as many pages for the new table as allowed by the capacity of the
        memcache. The issue with blindingly trying until the allocation
        fails is that kvm_mmu_memory_cache_alloc() WARNs and tries to
        allocate using GFP_ATOMIC when !nobjs. We don't want to do that,
        so we could extend kvm_pgtable_mm_ops.zalloc_page() with a
        NO_GFP_ATOMIC_ON_EMPTY (or similar). This flag would have to be
        ignored on the hyp side.

        4. what this patch is currently doing: tracking the capacity by
        hand.

I prefer options 4 and 3. WDYT?

Thanks,
Ricardo

> Thanks!
> Ricardo
>
> >
> > > +int kvm_pgtable_stage2_split(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > > +                          void *mc, u64 mc_capacity)
> > > +{
> > > +     struct stage2_split_data split_data = {
> > > +             .mmu            = pgt->mmu,
> > > +             .memcache       = mc,
> > > +             .mc_capacity    = mc_capacity,
> > > +     };
> > > +
> > > +     struct kvm_pgtable_walker walker = {
> > > +             .cb     = stage2_split_walker,
> > > +             .flags  = KVM_PGTABLE_WALK_LEAF,
> > > +             .arg    = &split_data,
> > > +     };
> > > +
> > > +     return kvm_pgtable_walk(pgt, addr, size, &walker);
> > > +}
> > > +
> > >   int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
> > >                             struct kvm_pgtable_mm_ops *mm_ops,
> > >                             enum kvm_pgtable_stage2_flags flags,
> > >
> >
> > Thanks,
> > Gavin
> >