Re: [PATCH v2 1/9] KVM: arm/arm64: vgic: Add LPI translation cache definition

Marc Zyngier <marc.zyngier@xxxxxxx> · Wed, 12 Jun 2019 10:52:25 +0100

Hi Julien,

On Wed, 12 Jun 2019 09:16:21 +0100,
Julien Thierry <julien.thierry@xxxxxxx> wrote:
> 
> Hi Marc,
> 
> On 11/06/2019 18:03, Marc Zyngier wrote:
> > Add the basic data structure that expresses an MSI to LPI
> > translation as well as the allocation/release hooks.
> > 
> > THe size of the cache is arbitrarily defined as 4*nr_vcpus.
> >
> 
> The size has been arbitrarily changed to 16*nr_vcpus :) .

Well spotted! ;-)

> 
> Nit: The*

Ah, usual lazy finger on the Shift key... One day I'll learn to type.

> 
> > Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> > ---
> >  include/kvm/arm_vgic.h        |  3 +++
> >  virt/kvm/arm/vgic/vgic-init.c |  5 ++++
> >  virt/kvm/arm/vgic/vgic-its.c  | 49 +++++++++++++++++++++++++++++++++++
> >  virt/kvm/arm/vgic/vgic.h      |  2 ++
> >  4 files changed, 59 insertions(+)
> > 
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index c36c86f1ec9a..ca7bcf52dc85 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -260,6 +260,9 @@ struct vgic_dist {
> >  	struct list_head	lpi_list_head;
> >  	int			lpi_list_count;
> >  
> > +	/* LPI translation cache */
> > +	struct list_head	lpi_translation_cache;
> > +
> >  	/* used by vgic-debug */
> >  	struct vgic_state_iter *iter;
> >  
> > diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> > index 3bdb31eaed64..c7c4c77dd430 100644
> > --- a/virt/kvm/arm/vgic/vgic-init.c
> > +++ b/virt/kvm/arm/vgic/vgic-init.c
> > @@ -64,6 +64,7 @@ void kvm_vgic_early_init(struct kvm *kvm)
> >  	struct vgic_dist *dist = &kvm->arch.vgic;
> >  
> >  	INIT_LIST_HEAD(&dist->lpi_list_head);
> > +	INIT_LIST_HEAD(&dist->lpi_translation_cache);
> >  	raw_spin_lock_init(&dist->lpi_list_lock);
> >  }
> >  
> > @@ -305,6 +306,7 @@ int vgic_init(struct kvm *kvm)
> >  	}
> >  
> >  	if (vgic_has_its(kvm)) {
> > +		vgic_lpi_translation_cache_init(kvm);
> >  		ret = vgic_v4_init(kvm);
> >  		if (ret)
> >  			goto out;
> > @@ -346,6 +348,9 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
> >  		INIT_LIST_HEAD(&dist->rd_regions);
> >  	}
> >  
> > +	if (vgic_has_its(kvm))
> > +		vgic_lpi_translation_cache_destroy(kvm);
> > +
> >  	if (vgic_supports_direct_msis(kvm))
> >  		vgic_v4_teardown(kvm);
> >  }
> > diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> > index 44ceaccb18cf..ce9bcddeb7f1 100644
> > --- a/virt/kvm/arm/vgic/vgic-its.c
> > +++ b/virt/kvm/arm/vgic/vgic-its.c
> > @@ -149,6 +149,14 @@ struct its_ite {
> >  	u32 event_id;
> >  };
> >  
> > +struct vgic_translation_cache_entry {
> > +	struct list_head	entry;
> > +	phys_addr_t		db;
> > +	u32			devid;
> > +	u32			eventid;
> > +	struct vgic_irq		*irq;
> > +};
> > +
> >  /**
> >   * struct vgic_its_abi - ITS abi ops and settings
> >   * @cte_esz: collection table entry size
> > @@ -1668,6 +1676,45 @@ static int vgic_register_its_iodev(struct kvm *kvm, struct vgic_its *its,
> >  	return ret;
> >  }
> >  
> > +/* Default is 16 cached LPIs per vcpu */
> > +#define LPI_DEFAULT_PCPU_CACHE_SIZE	16
> > +
> > +void vgic_lpi_translation_cache_init(struct kvm *kvm)
> > +{
> > +	struct vgic_dist *dist = &kvm->arch.vgic;
> > +	unsigned int sz;
> > +	int i;
> > +
> > +	if (!list_empty(&dist->lpi_translation_cache))
> > +		return;
> > +
> > +	sz = atomic_read(&kvm->online_vcpus) * LPI_DEFAULT_PCPU_CACHE_SIZE;
> > +
> > +	for (i = 0; i < sz; i++) {
> > +		struct vgic_translation_cache_entry *cte;
> > +
> > +		/* An allocation failure is not fatal */
> > +		cte = kzalloc(sizeof(*cte), GFP_KERNEL);
> > +		if (WARN_ON(!cte))
> > +			break;
> > +
> > +		INIT_LIST_HEAD(&cte->entry);
> > +		list_add(&cte->entry, &dist->lpi_translation_cache);
> 
> Going through the series, it looks like this list is either empty
> (before the cache init) or has a fixed number
> (LPI_DEFAULT_PCPU_CACHE_SIZE * nr_cpus) of entries.

Well, it could also fail when allocating one of the entry, meaning we
can have an allocation ranging from 0 to (LPI_DEFAULT_PCPU_CACHE_SIZE
* nr_cpus) entries.

> And the list never grows nor shrinks throughout the series, so it
> seems odd to be using a list here.
> 
> Is there a reason for not using a dynamically allocated array instead of
> the list? (does list_move() provide a big perf advantage over swapping
> the data from one array entry to another? Or is there some other
> facility I am missing?

The idea was to make the LRU policy cheap, on the assumption that
list_move (which is only a couple of pointer updates) is cheaper than
a memmove if you want to keep the array ordered. If we exclude the
list head, we end-up with 24 bytes per entry to move down to make room
for the new entry at the head of the array. For large caches that miss
very often, this will hurt badly. But is that really a problem? I
don't know.

We could allocate an array as you suggest, and use a linked list
inside the array. Or something else. I'm definitely open to
suggestion!

Thanks,

	M.

-- 
Jazz is not dead, it just smells funny.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm