On Mon, Aug 02, 2021 at 04:36:24PM +0200, Paolo Bonzini wrote: > On 31/07/21 00:37, David Matlack wrote: > > The memslot for a given gfn is looked up multiple times during page > > fault handling. Avoid binary searching for it multiple times by caching > > the least recently used slot. There is an existing VM-wide LRU slot but > > that does not work well for cases where vCPUs are accessing memory in > > different slots (see performance data below). > > > > Another benefit of caching the least recently use slot (versus looking > > up the slot once and passing around a pointer) is speeding up memslot > > lookups *across* faults and during spte prefetching. > > > > To measure the performance of this change I ran dirty_log_perf_test with > > 64 vCPUs and 64 memslots and measured "Populate memory time" and > > "Iteration 2 dirty memory time". Tests were ran with eptad=N to force > > dirty logging to use fast_page_fault so its performance could be > > measured. > > > > Config | Metric | Before | After > > ---------- | ----------------------------- | ------ | ------ > > tdp_mmu=Y | Populate memory time | 6.76s | 5.47s > > tdp_mmu=Y | Iteration 2 dirty memory time | 2.83s | 0.31s > > tdp_mmu=N | Populate memory time | 20.4s | 18.7s > > tdp_mmu=N | Iteration 2 dirty memory time | 2.65s | 0.30s > > > > The "Iteration 2 dirty memory time" results are especially compelling > > because they are equivalent to running the same test with a single > > memslot. In other words, fast_page_fault performance no longer scales > > with the number of memslots. > > > > Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx> > > It's the *most* recently used slot index, of course. :) That's true of > lru_slot as well. *facepalm* I'll include a patch in v2 to fix the name of the existing lru_slot to something like mru_slot or last_used_slot. > > > +static inline struct kvm_memory_slot *get_slot(struct kvm_memslots *slots, int slot_index) > > +{ > > + if (slot_index < 0 || slot_index >= slots->used_slots) > > + return NULL; > > + > > + return &slots->memslots[slot_index]; > > +} > > + > > Since there are plenty of arrays inside struct kvm_memory_slot*, do we want > to protect this against speculative out-of-bounds accesses with > array_index_nospec? I'm not sure when it makes sense to use array_index_nospec. Perhaps this is a good candidate since vcpu->lru_slot_index and the length of memslots[] can both be controlled by userspace. I'll play around with adding it to see if there are any performance trade-offs. > > > +static inline struct kvm_memory_slot * > > +search_memslots(struct kvm_memslots *slots, gfn_t gfn) > > +{ > > + int slot_index = __search_memslots(slots, gfn); > > + > > + return get_slot(slots, slot_index); > > } > > Let's use this occasion to remove the duplication between __gfn_to_memslot > and search_memslots; you can make search_memslots do the search and palce > the LRU (ehm, MRU) code to __gfn_to_memslot only. So: > > - the new patch 1 (something like "make search_memslots search without LRU > caching") is basically this series's patch 2, plus a tree-wide replacement > of search_memslots with __gfn_to_memslot. > > - the new patch 2 is this series's patch 1, except kvm_vcpu_gfn_to_memslot > uses search_memslots instead of __search_memslots. The comments in patch > 2's commit message about the double misses move to this commit message. Will do. > > > + if (slot) > > + vcpu->lru_slot_index = slot_index; > > Let's call it lru_slot for consistency with the field of struct > kvm_memory_slots. I'll make sure the two have consistent names when I rename them to get rid of "lru". > > Paolo >