On Tue, 25 Jun 2019 17:00:54 +0100, Zenghui Yu <yuzenghui@xxxxxxxxxx> wrote: > > Hi Marc, > > On 2019/6/25 20:31, Marc Zyngier wrote: > > Hi Zenghui, > > > > On 25/06/2019 12:50, Zenghui Yu wrote: > >> Hi Marc, > >> > >> On 2019/6/12 1:03, Marc Zyngier wrote: > >>> On a successful translation, preserve the parameters in the LPI > >>> translation cache. Each translation is reusing the last slot > >>> in the list, naturally evincting the least recently used entry. > >>> > >>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx> > >>> --- > >>> virt/kvm/arm/vgic/vgic-its.c | 86 ++++++++++++++++++++++++++++++++++++ > >>> 1 file changed, 86 insertions(+) > >>> > >>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c > >>> index 0aa0cbbc3af6..62932458476a 100644 > >>> --- a/virt/kvm/arm/vgic/vgic-its.c > >>> +++ b/virt/kvm/arm/vgic/vgic-its.c > >>> @@ -546,6 +546,90 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm, > >>> return 0; > >>> } > >>> +static struct vgic_irq *__vgic_its_check_cache(struct > >>> vgic_dist *dist, > >>> + phys_addr_t db, > >>> + u32 devid, u32 eventid) > >>> +{ > >>> + struct vgic_translation_cache_entry *cte; > >>> + struct vgic_irq *irq = NULL; > >>> + > >>> + list_for_each_entry(cte, &dist->lpi_translation_cache, entry) { > >>> + /* > >>> + * If we hit a NULL entry, there is nothing after this > >>> + * point. > >>> + */ > >>> + if (!cte->irq) > >>> + break; > >>> + > >>> + if (cte->db == db && > >>> + cte->devid == devid && > >>> + cte->eventid == eventid) { > >>> + /* > >>> + * Move this entry to the head, as it is the > >>> + * most recently used. > >>> + */ > >>> + list_move(&cte->entry, &dist->lpi_translation_cache); > >> > >> Only for performance reasons: if we hit at the "head" of the list, we > >> don't need to do a list_move(). > >> In our tests, we found that a single list_move() takes nearly (sometimes > >> even more than) one microsecond, for some unknown reason... > > > > Huh... That's odd. > > > > Can you narrow down under which conditions this happens? I'm not sure if > > checking for the list head would be more efficient, as you end-up > > fetching the head anyway. Can you try replacing this line with: > > > > if (!list_is_first(&cte->entry, &dist->lpi_translation_cache)) > > list_move(&cte->entry, &dist->lpi_translation_cache); > > > > and let me know whether it helps? > > It helps. With this change, the overhead of list_move() is gone. > > We run 16 4-vcpu VMs on the host, each with a vhost-user nic, and run > "iperf" in pairs between them. It's likely to hit at the head of the > cache list in our tests. > With this change, the sys% utilization of vhostdpfwd threads will > decrease by about 10%. But I don't know the reason exactly (I haven't > found any clues in code yet, in implementation of list_move...). list_move is rather simple, and shouldn't be too hard to execute quickly. The only contention I can imagine is that as the cache line is held by multiple CPUs, the update to the list pointers causes an invalidation to be sent to other CPUs, leading to a slower update. But it remains that 500ns is a pretty long time (that's 1000 cycles on a 2GHz CPU). It'd be interesting to throw perf at this and see shows up. It would give us a clue about what is going on here. Thanks, M. -- Jazz is not dead, it just smells funny. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm