Re: [PATCH v2] ARM: KVM: Fix permission fault handling

Christoffer Dall <c.dall@xxxxxxxxxxxxxxxxxxxxxx> · Tue, 13 Nov 2012 07:56:55 -0500

On Tue, Nov 13, 2012 at 5:43 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
> On 12/11/12 22:55, Christoffer Dall wrote:
>> On Mon, Nov 12, 2012 at 12:02 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>>> This patch fixes a very long standing bug in the KVM fault handling and
>>> is basically a backport of the equivalent arm64 code.
>>>
>>> The ARM ARM is quite explicit in saying the HPFAR cannot be trusted
>>> for a permission fault, unless it is a Stage-1 page table walk.
>>>
>>> The fact that it *seems* to work well enough on A15 is even more scary.
>>> We may well be silently corrupting memory without even noticing (hint,
>>> hint...). It is also quite possible that this bug would prevent KVM
>>> from running on a CPU which caches TLBs covering VA to PA with a single
>>> entry.
>>>
>>> The fix is to detect aborts that are permission faults, and not stage-1
>>> page table walk. We can then resolve the IPA, and store it just like the
>>> HPFAR register. If the IPA cannot be resolved, it means another CPU is
>>> playing with the page tables, and we simply restart the guest.
>>>
>>> This also paves the way for lighter Icache invalidation, using the NX bit.
>>
>> My main concern here is the complicated return code and this assembly
>> code is not too easy to read and it duplicates the code in
>> __kvm_va_to_pa. This is supposed to be a rare case, because if not,
>> we're merging the wrong pages anyway, and the system performance will
>> suck.
>
> This is not exactly rare. There's a number of cases were we fault a page
> by reading from it, and then writing to it. A lot of static data in the
> kernel causes this, actually.
>

yes, but unless that page is shared with another process at the point
when it's faulted in we get the page back as writable from
get_user_pages and map it writable in the stage-2 table from the
get-go.

I would say that if this is not rare, and happens a lot, we probably
shouldn't be mapping these pages read-only, but that's for another
discussion.

>> Wouldn't it be nicer to check the fault type in C-code and call the
>> __kvm_va_to_pa ?
>
> I don't find this nicer at all. Yes, it reuses a tiny bit of code, but
> the code path becomes very convoluted (fault, save guest, restore host,
> "oh wait, this was a permission fault", trap, save host, restore guest,
> resolve IPA, restore host, back to square one...) and a lot more expensive.
>
> Even if it wasn't performance critical, the above dance makes me think
> we'd be doing something wrong.
>

fair enough.

>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx>
>>> ---
>>> >From v1:
>>> - Use ATS1CPR instead of ATS1CPW
>>>
>>>  arch/arm/kvm/interrupts.S | 37 ++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 36 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>>> index 5a09e89..b2d52a9 100644
>>> --- a/arch/arm/kvm/interrupts.S
>>> +++ b/arch/arm/kvm/interrupts.S
>>> @@ -401,12 +401,47 @@ guest_trap:
>>>         mrc     p15, 4, r2, c6, c0, 0   @ HDFAR
>>>
>>>  2:     str     r2, [r1, #VCPU_HxFAR]
>>> -       mrc     p15, 4, r2, c6, c0, 4   @ HPFAR
>>> +
>>> +       /*
>>> +        * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode:
>>> +        *
>>> +        * Abort on the stage 2 translation for a memory access from a
>>> +        * Non-secure PL1 or PL0 mode:
>>> +        *
>>> +        * For any Access flag fault or Translation fault, and also for any
>>> +        * Permission fault on the stage 2 translation of a memory access
>>> +        * made as part of a translation table walk for a stage 1 translation,
>>> +        * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR
>>> +        * is UNKNOWN.
>>> +        */
>>> +
>>> +       /* Check for permission fault, and S1PTW */
>>> +       mrc     p15, 4, r1, c5, c2, 0   @ HSR
>>> +       and     r0, r1, #HSR_FSC_TYPE
>>> +       cmp     r0, #FSC_PERM
>>> +       tsteq   r1, #7                  @ S1PTW
>>> +       mrcne   p15, 4, r2, c6, c0, 4   @ HPFAR
>>> +       bne     3f
>>> +
>>> +       /* Resolve IPA using the xFAR */
>>> +       mcr     p15, 0, r2, c7, c8, 0   @ ATS1CPR
>>
>> isn't there a race where guest user mode code writes some address
>> which it can write, then we take the stage-2 fault, then another CPU
>            can't?
>> changes the VA mapping to be accessible only in PL1, but user space
>> code now gets a mapping to that memory which goes in the TLB and stays
>> even after we return from handling the exception?
>
> It really doesn't matter. We do not grant extra permission to the pages
> with Stage-2. When we resume the guest after handling the permission
> fault (granting write access), it will evaluate the translation again,
> and fault in the guest this time.
>
>         M.
> --
> Jazz is not dead. It just smells funny...
>
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm