Hi Marc,
On 2020/2/18 23:31, Marc Zyngier wrote:
Hi Zenghui,
On 2020-02-18 09:27, Marc Zyngier wrote:
Hi Zenghui,
On 2020-02-18 07:00, Zenghui Yu wrote:
Hi Marc,
[...]
There might be a race on reading the 'vpe->col_idx' against a concurrent
vPE schedule (col_idx will be modified in its_vpe_set_affinity)? Will we
end up accessing the GICR_VSGI* registers of the old redistributor,
while the vPE is now resident on the new one? Or is it harmful?
Very well spotted. There is a potential problem if old and new RDs are
not part
of the same CommonLPIAff group.
The same question for direct_lpi_inv(), where 'vpe->col_idx' will be
used in irq_to_cpuid().
Same problem indeed. We need to ensure that no VMOVP operation can
occur whilst
we use col_idx to access a redistributor. This means a vPE lock of
some sort
that will protect the affinity.
Yeah, I had the same view here, a vPE level lock might help.
But I think there is a slightly more general problem here, which we
failed to
see initially: the same issue exists for physical LPIs, as col_map[]
can be
updated (its_set_affinity()) in parallel with a direct invalidate.
The good old invalidation through the ITS does guarantee that the two
operation
don't overlap, but direct invalidation breaks it.
Agreed!
Let me have a think about it.
So I've thought about it, wrote a patch, and I don't really like the
look of it.
This is pretty invasive, and we end-up serializing a lot more than we
used to
(the repurposing of vlpi_lock to a general "lpi mapping lock" is
probably too
coarse).
It of course needs splitting over at least three patches, but it'd be
good if
you could have a look (applies on top of the whole series).
So the first thing is that
1. there're races on choosing the RD against a concurrent LPI/vPE
affinity changing.
And sure, I will have a look on the following patch! But I'd first
talk about some other issues I've seen today...
2. Another potential race is on accessing the same RD by different
CPUs, which gets more obvious after introducing the GICv4.1.
We can as least take two registers for example:
- GICR_VSGIR:
Let's assume that vPE0 is just descheduled from CPU0 and then vPE1
is scheduled on. CPU0 is writing its GICR_VSGIR with vpeid1 to serve
vPE1's GICR_ISPENDR0 read trap, whilst userspace is getting vSGI's
pending state of vPE0 (i.e., by a debugfs read) thus another CPU
will try to write the same GICR_VSGIR with vpeid0... without waiting
GICR_VSGIPENDR.Busy reads as 0.
This is a CONSTRAINED UNPREDICTABLE behavior from the spec and at
least one of the queries will fail.
- GICR_INV{LPI,ALL}R:
Multiple LPIs can be targeted to the same RD, thus multiple writes to
the same GICR_INVLPIR (with different INITID, even with different V)
can happen concurrently...
Above comes from the fact that the same redistributor can be accessed
(concurrently) by multiple CPUs but we don't have a mechanism to ensure
some extent of serialization. I also had a look at how KVM will handle
this kind of access, but
3. it looks like KVM makes the assumption that the per-RD MMIO region
will only be accessed by the associated VCPU? But I think this's not
restricted by the architecture, we can do it better. Or I've just
missed some important points here.
I will look at the following patch asap but may need some time to
think about all above, and do some fix if possible :-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index 7656b353a95f..0ed286dba827 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
[...]
Thanks,
Zenghui