On Wed, 24 Jan 2024 20:49:06 +0000, Oliver Upton <oliver.upton@xxxxxxxxx> wrote: > > To date the translation cache LRU policy relies on the ordering of the > linked-list to pick the victim, as entries are moved to the head of the > list on every cache hit. These sort of transformations are incompatible > with an rculist, necessitating a different strategy for recording usage > in-place. > > Count the number of cache hits since the last translation cache miss for > every entry. The preferences for selecting a victim are as follows: > > - Invalid entries over valid entries > > - Valid entry with the lowest usage count > > - In the case of a tie, pick the entry closest to the tail (oldest) > > Signed-off-by: Oliver Upton <oliver.upton@xxxxxxxxx> > --- > arch/arm64/kvm/vgic/vgic-its.c | 42 ++++++++++++++++++++++++++-------- > 1 file changed, 32 insertions(+), 10 deletions(-) > > diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c > index aec82d9a1b3c..ed0c6c333a6c 100644 > --- a/arch/arm64/kvm/vgic/vgic-its.c > +++ b/arch/arm64/kvm/vgic/vgic-its.c > @@ -154,6 +154,7 @@ struct vgic_translation_cache_entry { > u32 devid; > u32 eventid; > struct vgic_irq *irq; > + atomic64_t usage_count; > }; > > /** > @@ -577,13 +578,7 @@ static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist, > cte->eventid != eventid) > continue; > > - /* > - * Move this entry to the head, as it is the most > - * recently used. > - */ > - if (!list_is_first(&cte->entry, &dist->lpi_translation_cache)) > - list_move(&cte->entry, &dist->lpi_translation_cache); > - > + atomic64_inc(&cte->usage_count); > return cte->irq; > } > > @@ -616,6 +611,30 @@ static unsigned int vgic_its_max_cache_size(struct kvm *kvm) > return atomic_read(&kvm->online_vcpus) * LPI_DEFAULT_PCPU_CACHE_SIZE; > } > > +static struct vgic_translation_cache_entry *vgic_its_cache_victim(struct vgic_dist *dist) > +{ > + struct vgic_translation_cache_entry *cte, *victim = NULL; > + u64 min, tmp; > + > + /* > + * Find the least used cache entry since the last cache miss, preferring > + * older entries in the case of a tie. Note that usage accounting is > + * deliberately non-atomic, so this is all best-effort. > + */ > + list_for_each_entry(cte, &dist->lpi_translation_cache, entry) { > + if (!cte->irq) > + return cte; > + > + tmp = atomic64_xchg_relaxed(&cte->usage_count, 0); > + if (!victim || tmp <= min) { min is not initialised until after the first round. Not great. How comes the compiler doesn't spot this? > + victim = cte; > + min = tmp; > + } > + } So this resets all the counters on each search for a new insertion? Seems expensive, specially on large VMs (512 * 16 = up to 8K SWP instructions in a tight loop, and I'm not even mentioning the fun without LSE). I can at least think of a box that will throw its interconnect out of the pram it tickled that way. I'd rather the new cache entry inherits the max of the current set, making it a lot cheaper. We can always detect the overflow and do a full invalidation in that case (worse case -- better options exist). > + > + return victim; > +} > + > static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its, > u32 devid, u32 eventid, > struct vgic_irq *irq) > @@ -645,9 +664,12 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its, > goto out; > > if (dist->lpi_cache_count >= vgic_its_max_cache_size(kvm)) { > - /* Always reuse the last entry (LRU policy) */ > - victim = list_last_entry(&dist->lpi_translation_cache, > - typeof(*cte), entry); > + victim = vgic_its_cache_victim(dist); > + if (WARN_ON_ONCE(!victim)) { > + victim = new; > + goto out; > + } I don't understand how this could happen. It sort of explains the oddity I was mentioning earlier, but I don't think we need this complexity. Thanks, M. -- Without deviation from the norm, progress is not possible.