Re: [PATCH kernel v5 10/10] KVM: PPC: VFIO: Add in-kernel acceleration for VFIO

Alexey Kardashevskiy <aik@xxxxxxxxx> · Mon, 27 Feb 2017 14:20:13 +1100

On 27/02/17 12:53, David Gibson wrote:
> On Fri, Feb 24, 2017 at 02:43:05PM +1100, Alexey Kardashevskiy wrote:
>> On 24/02/17 14:36, David Gibson wrote:
>>> On Fri, Feb 24, 2017 at 02:29:14PM +1100, Alexey Kardashevskiy wrote:
>>>> On 24/02/17 13:14, David Gibson wrote:
>>>>> On Wed, Feb 22, 2017 at 07:21:33PM +1100, Alexey Kardashevskiy
>> wrote:
> [snip]
>>>>>> +static long kvmppc_rm_tce_iommu_unmap(struct kvm *kvm,
>>>>>> +		struct iommu_table *tbl, unsigned long entry)
>>>>>> +{
>>>>>> +	enum dma_data_direction dir = DMA_NONE;
>>>>>> +	unsigned long hpa = 0;
>>>>>> +	long ret;
>>>>>> +
>>>>>> +	if (iommu_tce_xchg_rm(tbl, entry, &hpa, &dir))
>>>>>> +		return H_HARDWARE;
>>>>>
>>>>> To avoid a double WARN() (and to make the warnings easier to
>>>>> understand) I'd suggest putting a WARN_ON() here, rather than in the
>>>>> callers when they receieve an H_HARDWARE.  IIUC this really shouldn't
>>>>> ever happen, and it certainly can't be the guest's fault?
>>>>
>>>>
>>>> Makes sense.
>>>
>>> I guess it might want WARN_ON_ONCE() to avoid spamming the user with
>>> errors for every TCE, though.
>>
>>
>> We do not expect this to happen at all :) I can convert all of them to
>> _ONCE really as the purpose of WARN_ON is mostly to document what we do not
>> expect.
> 
> Sure, seems reasonable.
> 
> [snip]
>>>>>> @@ -220,9 +338,10 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>>>>>  {
>>>>>>  	struct kvmppc_spapr_tce_table *stt;
>>>>>>  	long i, ret = H_SUCCESS;
>>>>>> -	unsigned long tces, entry, ua = 0;
>>>>>> +	unsigned long tces, entry, ua = 0, tce, gpa;
>>>>>>  	unsigned long *rmap = NULL;
>>>>>>  	bool prereg = false;
>>>>>> +	struct kvmppc_spapr_tce_iommu_table *stit;
>>>>>>  
>>>>>>  	stt = kvmppc_find_table(vcpu->kvm, liobn);
>>>>>>  	if (!stt)
>>>>>> @@ -287,12 +406,24 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
>>>>>>  	}
>>>>>>  
>>>>>>  	for (i = 0; i < npages; ++i) {
>>>>>> -		unsigned long tce = be64_to_cpu(((u64 *)tces)[i]);
>>>>>> +		tce = be64_to_cpu(((u64 *)tces)[i]);
>>>>>>  
>>>>>>  		ret = kvmppc_tce_validate(stt, tce);
>>>>>>  		if (ret != H_SUCCESS)
>>>>>>  			goto unlock_exit;
>>>>>>  
>>>>>> +		gpa = tce & ~(TCE_PCI_READ | TCE_PCI_WRITE);
>>>>>> +
>>>>>> +		list_for_each_entry_lockless(stit, &stt->iommu_tables, next) {
>>>>>> +			ret = kvmppc_rm_tce_iommu_map(vcpu->kvm,
>>>>>> +					stit->tbl, entry + i, gpa,
>>>>>> +					iommu_tce_direction(tce));
>>>>>> +			if (WARN_ON_ONCE(ret == H_HARDWARE))
>>>>>
>>>>> I don't think you need the WARN() here - the only H_HARDWARE failure
>>>>> path in iommu_map() already includes a WARN().
>>>>
>>>>
>>>> True, I can drop it here.
>>>>
>>>>
>>>>>
>>>>>> +				kvmppc_rm_clear_tce(stit->tbl, entry);
>>>>>> +			else if (ret != H_SUCCESS)
>>>>>> +				goto unlock_exit;
>>>>>
>>>>> It's also not clear to me why the H_HARDWARE error path clears the
>>>>> entry, but the other failure paths don't.  Or why an H_HARDWARE will
>>>>> result in continuing to set the rest of the TCEs, but other failures
>>>>> won't.
>>>>
>>>>
>>>> The idea was that other failures still have some chance that handling may
>>>> succeed in virtual mode or via QEMU, H_HARDWARE is fatal.
>>>
>>> Um... yes.. but the logic seems to be backwards for that: on
>>> H_HARDWARE you warn and keep going, on other errors you bail out
>>> entirely.
>>
>> By "fatal" I means fatal for this particular hardware TCE(s), no hope in
>> trying this particular TCE in virtual mode.
> 
> Ok... still not following why that means the "fatal" error results in
> continuing to attempt for the rest of the updated TCEs, whereas the
> "non fatal" one bails out.

I was applying the principle that if after all checks done we still cannot
update the hardware table, then just clear the TCE and move on. Or I
misunderstood the idea?

> Especially since the bail out will only go
> to virtual mode if ret == H_TOO_HARD, which it isn't clear is the only
> possibility.

H_TOO_HARD goes to virtual mode, H_TOO_HARD in virtual goes to the
userspace (QEMU).

Will "if (WARN_ON_ONCE(ret != H_SUCCESS && ret != H_TOO_HARD))" make more
sense?

> 
>>>> I am just not sure if H_PARAMETER is what I want to return at [1], to make
>>>> the calling code simplier, I could return H_HARDWARE there as well (instead
>>>> of H_PARAMETER).
>>>
>>> That sounds right, IIUC the gpa to ua translation shouldn't ever
>>> fail because of something the guest did. 
>>
>>
>> The guest can easily pass bad TCE/GPA which is not in any registered slot.
>> So it is rather H_PARAMETER.
> 
> Ah, yes.
> 
>>> So I'd expect either
>>> H_HARDWARE, or H_TOO_HARD (if there's some hope that virtual mode can
>>> make the translation when real mode couldn't).
>>
>> No, virtual mode uses the exact same helper.
> 
> Ok.
> 

-- 
Alexey

Attachment:
signature.asc

Description: OpenPGP digital signature