On Thu, 26 Jan 2023 18:45:43 +0000, Ricardo Koller <ricarkol@xxxxxxxxxx> wrote: > > On Tue, Jan 24, 2023 at 2:45 PM Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > > > Hi Ricardo, > > > > On Fri, Jan 13, 2023 at 03:49:57AM +0000, Ricardo Koller wrote: > > > Split huge pages eagerly when enabling dirty logging. The goal is to > > > avoid doing it while faulting on write-protected pages, which > > > negatively impacts guest performance. > > > > > > A memslot marked for dirty logging is split in 1GB pieces at a time. > > > This is in order to release the mmu_lock and give other kernel threads > > > the opportunity to run, and also in order to allocate enough pages to > > > split a 1GB range worth of huge pages (or a single 1GB huge page). > > > Note that these page allocations can fail, so eager page splitting is > > > best-effort. This is not a correctness issue though, as huge pages > > > can still be split on write-faults. > > > > > > The benefits of eager page splitting are the same as in x86, added > > > with commit a3fe5dbda0a4 ("KVM: x86/mmu: Split huge pages mapped by > > > the TDP MMU when dirty logging is enabled"). For example, when running > > > dirty_log_perf_test with 64 virtual CPUs (Ampere Altra), 1GB per vCPU, > > > 50% reads, and 2MB HugeTLB memory, the time it takes vCPUs to access > > > all of their memory after dirty logging is enabled decreased by 44% > > > from 2.58s to 1.42s. > > > > > > Signed-off-by: Ricardo Koller <ricarkol@xxxxxxxxxx> > > > --- > > > arch/arm64/include/asm/kvm_host.h | 30 ++++++++ > > > arch/arm64/kvm/mmu.c | 110 +++++++++++++++++++++++++++++- > > > 2 files changed, 138 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h > > > index 35a159d131b5..6ab37209b1d1 100644 > > > --- a/arch/arm64/include/asm/kvm_host.h > > > +++ b/arch/arm64/include/asm/kvm_host.h > > > @@ -153,6 +153,36 @@ struct kvm_s2_mmu { > > > /* The last vcpu id that ran on each physical CPU */ > > > int __percpu *last_vcpu_ran; > > > > > > + /* > > > + * Memory cache used to split EAGER_PAGE_SPLIT_CHUNK_SIZE worth of huge > > > + * pages. It is used to allocate stage2 page tables while splitting > > > + * huge pages. Its capacity should be EAGER_PAGE_SPLIT_CACHE_CAPACITY. > > > + * Note that the choice of EAGER_PAGE_SPLIT_CHUNK_SIZE influences both > > > + * the capacity of the split page cache (CACHE_CAPACITY), and how often > > > + * KVM reschedules. Be wary of raising CHUNK_SIZE too high. > > > + * > > > + * A good heuristic to pick CHUNK_SIZE is that it should be larger than > > > + * all the available huge-page sizes, and be a multiple of all the > > > + * other ones; for example, 1GB when all the available huge-page sizes > > > + * are (1GB, 2MB, 32MB, 512MB). > > > + * > > > + * CACHE_CAPACITY should have enough pages to cover CHUNK_SIZE; for > > > + * example, 1GB requires the following number of PAGE_SIZE-pages: > > > + * - 512 when using 2MB hugepages with 4KB granules (1GB / 2MB). > > > + * - 513 when using 1GB hugepages with 4KB granules (1 + (1GB / 2MB)). > > > + * - 32 when using 32MB hugepages with 16KB granule (1GB / 32MB). > > > + * - 2 when using 512MB hugepages with 64KB granules (1GB / 512MB). > > > + * CACHE_CAPACITY below assumes the worst case: 1GB hugepages with 4KB > > > + * granules. > > > + * > > > + * Protected by kvm->slots_lock. > > > + */ > > > +#define EAGER_PAGE_SPLIT_CHUNK_SIZE SZ_1G > > > +#define EAGER_PAGE_SPLIT_CACHE_CAPACITY \ > > > + (DIV_ROUND_UP_ULL(EAGER_PAGE_SPLIT_CHUNK_SIZE, SZ_1G) + \ > > > + DIV_ROUND_UP_ULL(EAGER_PAGE_SPLIT_CHUNK_SIZE, SZ_2M)) > > > > Could you instead make use of the existing KVM_PGTABLE_MIN_BLOCK_LEVEL > > as the batch size? 513 pages across all page sizes is a non-negligible > > amount of memory that goes largely unused when PAGE_SIZE != 4K. > > > > Sounds good, will refine this for v2. > > > With that change it is a lot easier to correctly match the cache > > capacity to the selected page size. Additionally, we continue to have a > > single set of batching logic that we can improve later on. > > > > > + struct kvm_mmu_memory_cache split_page_cache; > > > + > > > struct kvm_arch *arch; > > > }; > > > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > > index 700c5774b50d..41ee330edae3 100644 > > > --- a/arch/arm64/kvm/mmu.c > > > +++ b/arch/arm64/kvm/mmu.c > > > @@ -31,14 +31,24 @@ static phys_addr_t hyp_idmap_vector; > > > > > > static unsigned long io_map_base; > > > > > > -static phys_addr_t stage2_range_addr_end(phys_addr_t addr, phys_addr_t end) > > > +bool __read_mostly eager_page_split = true; > > > +module_param(eager_page_split, bool, 0644); > > > + > > > > Unless someone is really begging for it I'd prefer we not add a module > > parameter for this. > > It was mainly to match x86 and because it makes perf testing a bit > simpler. What do others think? >From my PoV this is a no. If you have a flag because this is an experimental feature (like NV), then this is a kernel option, and you taint the kernel when it is set. If you have a flag because this is a modal option that makes different use of the HW which cannot be exposed to userspace (like GICv4), then this also is a kernel option. This is neither. The one thing that would convince me to make it an option is the amount of memory this thing consumes. 512+ pages is a huge amount, and I'm not overly happy about that. Why can't this be a userspace visible option, selectable on a per VM (or memslot) basis? Thanks, M. -- Without deviation from the norm, progress is not possible.