Hi Marc,
On 2019/6/25 20:31, Marc Zyngier wrote:
Hi Zenghui,
On 25/06/2019 12:50, Zenghui Yu wrote:
Hi Marc,
On 2019/6/12 1:03, Marc Zyngier wrote:
On a successful translation, preserve the parameters in the LPI
translation cache. Each translation is reusing the last slot
in the list, naturally evincting the least recently used entry.
Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
---
virt/kvm/arm/vgic/vgic-its.c | 86 ++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 0aa0cbbc3af6..62932458476a 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -546,6 +546,90 @@ static unsigned long vgic_mmio_read_its_idregs(struct kvm *kvm,
return 0;
}
+static struct vgic_irq *__vgic_its_check_cache(struct vgic_dist *dist,
+ phys_addr_t db,
+ u32 devid, u32 eventid)
+{
+ struct vgic_translation_cache_entry *cte;
+ struct vgic_irq *irq = NULL;
+
+ list_for_each_entry(cte, &dist->lpi_translation_cache, entry) {
+ /*
+ * If we hit a NULL entry, there is nothing after this
+ * point.
+ */
+ if (!cte->irq)
+ break;
+
+ if (cte->db == db &&
+ cte->devid == devid &&
+ cte->eventid == eventid) {
+ /*
+ * Move this entry to the head, as it is the
+ * most recently used.
+ */
+ list_move(&cte->entry, &dist->lpi_translation_cache);
Only for performance reasons: if we hit at the "head" of the list, we
don't need to do a list_move().
In our tests, we found that a single list_move() takes nearly (sometimes
even more than) one microsecond, for some unknown reason...
Huh... That's odd.
Can you narrow down under which conditions this happens? I'm not sure if
checking for the list head would be more efficient, as you end-up
fetching the head anyway. Can you try replacing this line with:
if (!list_is_first(&cte->entry, &dist->lpi_translation_cache))
list_move(&cte->entry, &dist->lpi_translation_cache);
and let me know whether it helps?
It helps. With this change, the overhead of list_move() is gone.
We run 16 4-vcpu VMs on the host, each with a vhost-user nic, and run
"iperf" in pairs between them. It's likely to hit at the head of the
cache list in our tests.
With this change, the sys% utilization of vhostdpfwd threads will
decrease by about 10%. But I don't know the reason exactly (I haven't
found any clues in code yet, in implementation of list_move...).
Thanks,
zenghui