Re: [PATCH v6 14/24] KVM: arm64: vgic-v3: vgic_v3_lpi_sync_pending_status

Auger Eric <eric.auger@xxxxxxxxxx> · Fri, 5 May 2017 16:20:42 +0200

Hi Christoffer, Marc,

On 05/05/2017 12:35, Marc Zyngier wrote:
> On 05/05/17 11:10, Christoffer Dall wrote:
>> On Fri, May 05, 2017 at 10:59:09AM +0100, Marc Zyngier wrote:
>>> On 05/05/17 10:45, Auger Eric wrote:
>>>> Hi,
>>>>
>>>> On 05/05/2017 10:11, Christoffer Dall wrote:
>>>>> On Thu, May 04, 2017 at 01:44:34PM +0200, Eric Auger wrote:
>>>>>> this new helper synchronizes the irq pending_latch
>>>>>> with the LPI pending bit status found in rdist pending table.
>>>>>> As the status is consumed, we reset the bit in pending table.
>>>>>>
>>>>>> As we need the PENDBASER_ADDRESS() in vgic-v3, let's move its
>>>>>> definition in the irqchip header. We restore the full length
>>>>>> of the field, ie [51:16]. Same for PROPBASER_ADDRESS with full
>>>>>> field length of [51:12].
>>>>>
>>>>> why into irqchip and not just the vgic header file?
>>>> Well most register field shift/masks are located there. This may be
>>>> useful as well for the ITS driver if power management gets implemented
>>>
>>> Yeah, I'm fine with that. Having all of the HW description in one single
>>> place makes sense.
>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v6: new
>>>>>> ---
>>>>>>  include/linux/irqchip/arm-gic-v3.h |  2 ++
>>>>>>  virt/kvm/arm/vgic/vgic-its.c       |  6 ++----
>>>>>>  virt/kvm/arm/vgic/vgic-v3.c        | 44 ++++++++++++++++++++++++++++++++++++++
>>>>>>  virt/kvm/arm/vgic/vgic.h           |  1 +
>>>>>>  4 files changed, 49 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
>>>>>> index 9519c7b..e09e5d7 100644
>>>>>> --- a/include/linux/irqchip/arm-gic-v3.h
>>>>>> +++ b/include/linux/irqchip/arm-gic-v3.h
>>>>>> @@ -159,6 +159,8 @@
>>>>>>  #define GICR_PROPBASER_RaWaWb	GIC_BASER_CACHEABILITY(GICR_PROPBASER, INNER, RaWaWb)
>>>>>>  
>>>>>>  #define GICR_PROPBASER_IDBITS_MASK			(0x1f)
>>>>>> +#define GICR_PROPBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 12))
>>>>>> +#define GICR_PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(51, 16))
>>>>>>  
>>>>>>  #define GICR_PENDBASER_SHAREABILITY_SHIFT		(10)
>>>>>>  #define GICR_PENDBASER_INNER_CACHEABILITY_SHIFT		(7)
>>>>>> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
>>>>>> index e7bb86a..f43ea30c 100644
>>>>>> --- a/virt/kvm/arm/vgic/vgic-its.c
>>>>>> +++ b/virt/kvm/arm/vgic/vgic-its.c
>>>>>> @@ -198,8 +198,6 @@ static struct its_ite *find_ite(struct vgic_its *its, u32 device_id,
>>>>>>   */
>>>>>>  #define BASER_ADDRESS(x)	((x) & GENMASK_ULL(47, 16))
>>>>>>  #define CBASER_ADDRESS(x)	((x) & GENMASK_ULL(47, 12))
>>>>>> -#define PENDBASER_ADDRESS(x)	((x) & GENMASK_ULL(47, 16))
>>>>>> -#define PROPBASER_ADDRESS(x)	((x) & GENMASK_ULL(47, 12))
>>>>>>  
>>>>>>  #define GIC_LPI_OFFSET 8192
>>>>>>  
>>>>>> @@ -234,7 +232,7 @@ static struct its_collection *find_collection(struct vgic_its *its, int coll_id)
>>>>>>  static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
>>>>>>  			     struct kvm_vcpu *filter_vcpu)
>>>>>>  {
>>>>>> -	u64 propbase = PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
>>>>>> +	u64 propbase = GICR_PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
>>>>>>  	u8 prop;
>>>>>>  	int ret;
>>>>>>  
>>>>>> @@ -346,7 +344,7 @@ static u32 max_lpis_propbaser(u64 propbaser)
>>>>>>   */
>>>>>>  static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>> -	gpa_t pendbase = PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
>>>>>> +	gpa_t pendbase = GICR_PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
>>>>>>  	struct vgic_irq *irq;
>>>>>>  	int last_byte_offset = -1;
>>>>>>  	int ret = 0;
>>>>>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
>>>>>> index be0f4c3..0d753ae 100644
>>>>>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>>>>>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>>>>>> @@ -252,6 +252,50 @@ void vgic_v3_enable(struct kvm_vcpu *vcpu)
>>>>>>  	vgic_v3->vgic_hcr = ICH_HCR_EN;
>>>>>>  }
>>>>>>  
>>>>>> +int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
>>>>>> +{
>>>>>> +	struct kvm_vcpu *vcpu;
>>>>>> +	int byte_offset, bit_nr;
>>>>>> +	gpa_t pendbase, ptr;
>>>>>> +	bool status;
>>>>>> +	u8 val;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +retry:
>>>>>> +	vcpu = irq->target_vcpu;
>>>>>> +	if (!vcpu)
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	pendbase = GICR_PENDBASER_ADDRESS(vcpu->arch.vgic_cpu.pendbaser);
>>>>>> +
>>>>>> +	byte_offset = irq->intid / BITS_PER_BYTE;
>>>>>> +	bit_nr = irq->intid % BITS_PER_BYTE;
>>>>>> +	ptr = pendbase + byte_offset;
>>>>>> +
>>>>>> +	ret = kvm_read_guest(kvm, ptr, &val, 1);
>>>>>> +	if (ret)
>>>>>> +		return ret;
>>>>>> +
>>>>>> +	status = val & (1 << bit_nr);
>>>>>> +
>>>>>> +	spin_lock(&irq->irq_lock);
>>>>>> +	if (irq->target_vcpu != vcpu) {
>>>>>> +		spin_unlock(&irq->irq_lock);
>>>>>> +		goto retry;
>>>>>
>>>>> Can the guest be continuously changing the configuration of the LPI and
>>>>> cause this function to be called, which will efficiently hog this CPU
>>>>> from the system, or am I being overly cautious here?
>>>> Yes but on the other hand, there is a risk the target_vcpu has changed,
>>>> isn't it? So the alternative you be to return -EBUSY. and caller, ie.
>>>> its_add_lpi would return that error?
>>>
>>> For the guest to be changing the LPI configuration, it would take a
>>> command to be issued. If there is a concern that we're racing against a
>>> command, can we take the command queue mutex?
>>>
>>
>> So this is a redistributor thing, not an ITS thing, so I'd prefer not
>> solving it that way.
> 
> Indeed.
> 
>> I was just thinking if we should do a limit of the number of times we'll
>> do this or check if we have a pending signal and just return, but
>> actually, I don't think this is a problem, because the path that keeps
>> changing the configuration will eventually run out of CPU resources and
>> this thread would have forward progress.
> 
> Yes, assuming we're not holding any other spinlock on the same path.
> Looking at the code, we only seem to come here from handling MAPI/MAPTI,
> so we should be good.
> 
>>>>>
>>>>>> +	}
>>>>>> +	irq->pending_latch = status;
>>>>>> +	vgic_queue_irq_unlock(vcpu->kvm, irq);
>>>>>> +
>>>>>> +	if (status) {
>>>>>> +		/* clear consumed data */
>>>>>> +		val &= ~(1 << bit_nr);
>>>>>> +		ret = kvm_write_guest(kvm, ptr, &val, 1);
>>>>>> +		if (ret)
>>>>>> +			return ret;
>>>>>
>>>>> Do we have a problem that if this is done twice within the same byte (on
>>>>> different LPIs) then the data could be strangely out of sync?
>>>> Not sure I get what you mean here? I reset a single bit within the byte.
>>>> Do you mean there could be a concurrency issue?
>>>>
>>>> In principle we are not obliged to reset the bit, right? Why do we care?
>>>> The table will be updated on next pending table save.
>>>
>>> 1) the guest should never write to the pending table.
>>> 2) if there is an interrupt being injected, it will be made pending in
>>> the irq structure, and not in the PT.
>>
>> If you have tGwo separate cores running this function at the same time,
>> trying to clear each a bit in the same word, wouldn't you loose one of
>> the cleared bits?
> 
> Ah, I see what you mean now (I was obviously looking at the wrong end of
> the problem).
> 
>> Maybe that can never happen because the commands would be serialized and
>> we should be restoring the system in serial as well?
> 
> I think that should be the case. What does it mean to do two restore in
> parallel? We should probably enforce this. Eric?
We can't have 2 restores in parallel as we hold the kvm->lock all along.
MAPI commands are serialized by its_cmd_lock. MAPI and restore cannot
happen concurrently since vcpu is stopped and kvm_lock is held during
GITS_CWRITER setting too. So my understanding is it can't have this race.

Thanks

Eric
> 
> 	M.
>