Re: [PATCH v2 7/9] KVM: arm/arm64: vgic-its: Cache successful MSI->LPI translation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 25 Jun 2019 17:00:54 +0100,
Zenghui Yu <yuzenghui@xxxxxxxxxx> wrote:
> 
> Hi Marc,
> 
> On 2019/6/25 20:31, Marc Zyngier wrote:
> > Hi Zenghui,
> > 
> > On 25/06/2019 12:50, Zenghui Yu wrote:
> >> Hi Marc,
> >> 
> >> On 2019/6/12 1:03, Marc Zyngier wrote:
> >>> On a successful translation, preserve the parameters in the LPI
> >>> translation cache. Each translation is reusing the last slot
> >>> in the list, naturally evincting the least recently used entry.
> >>> 
> >>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
> >>> ---
> >>>    virt/kvm/arm/vgic/vgic-its.c | 86 ++++++++++++++++++++++++++++++++++++
> >>>    1 file changed, 86 insertions(+)
> >>> 
> >>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> >>> index 0aa0cbbc3af6..62932458476a 100644
> >>> --- a/virt/kvm/arm/vgic/vgic-its.c
> >>> +++ b/virt/kvm/arm/vgic/vgic-its.c
> >>> @@ -546,6 +546,90 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
> >>>    	return 0;
> >>>    }
> >>>    +static struct vgic_irq *__vgic_its_check_cache(struct
> >>> vgic_dist *dist,
> >>> +					       phys_addr_t db,
> >>> +					       u32 devid, u32 eventid)
> >>> +{
> >>> +	struct vgic_translation_cache_entry *cte;
> >>> +	struct vgic_irq *irq = NULL;
> >>> +
> >>> +	list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
> >>> +		/*
> >>> +		 * If we hit a NULL entry, there is nothing after this
> >>> +		 * point.
> >>> +		 */
> >>> +		if (!cte->irq)
> >>> +			break;
> >>> +
> >>> +		if (cte->db == db &&
> >>> +		    cte->devid == devid &&
> >>> +		    cte->eventid == eventid) {
> >>> +			/*
> >>> +			 * Move this entry to the head, as it is the
> >>> +			 * most recently used.
> >>> +			 */
> >>> +			list_move(&cte->entry, &dist->lpi_translation_cache);
> >> 
> >> Only for performance reasons: if we hit at the "head" of the list, we
> >> don't need to do a list_move().
> >> In our tests, we found that a single list_move() takes nearly (sometimes
> >> even more than) one microsecond, for some unknown reason...
> > 
> > Huh... That's odd.
> > 
> > Can you narrow down under which conditions this happens? I'm not sure if
> > checking for the list head would be more efficient, as you end-up
> > fetching the head anyway. Can you try replacing this line with:
> > 
> > 	if (!list_is_first(&cte->entry, &dist->lpi_translation_cache))
> > 		list_move(&cte->entry, &dist->lpi_translation_cache);
> > 
> > and let me know whether it helps?
> 
> It helps. With this change, the overhead of list_move() is gone.
> 
> We run 16 4-vcpu VMs on the host, each with a vhost-user nic, and run
> "iperf" in pairs between them.  It's likely to hit at the head of the
> cache list in our tests.
> With this change, the sys% utilization of vhostdpfwd threads will
> decrease by about 10%.  But I don't know the reason exactly (I haven't
> found any clues in code yet, in implementation of list_move...).

list_move is rather simple, and shouldn't be too hard to execute
quickly. The only contention I can imagine is that as the cache line
is held by multiple CPUs, the update to the list pointers causes an
invalidation to be sent to other CPUs, leading to a slower update.

But it remains that 500ns is a pretty long time (that's 1000 cycles on
a 2GHz CPU). It'd be interesting to throw perf at this and see shows
up. It would give us a clue about what is going on here.

Thanks,

	M.

-- 
Jazz is not dead, it just smells funny.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm



[Index of Archives]     [Linux KVM]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux