Re: [PATCH v2] ARM: KVM: Fix permission fault handling

Christoffer Dall <c.dall@xxxxxxxxxxxxxxxxxxxxxx> · Tue, 13 Nov 2012 08:28:42 -0500

On Tue, Nov 13, 2012 at 8:08 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
> On 13/11/12 12:56, Christoffer Dall wrote:
>> On Tue, Nov 13, 2012 at 5:43 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>>> On 12/11/12 22:55, Christoffer Dall wrote:
>>>> On Mon, Nov 12, 2012 at 12:02 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>>>>> This patch fixes a very long standing bug in the KVM fault handling and
>>>>> is basically a backport of the equivalent arm64 code.
>>>>>
>>>>> The ARM ARM is quite explicit in saying the HPFAR cannot be trusted
>>>>> for a permission fault, unless it is a Stage-1 page table walk.
>>>>>
>>>>> The fact that it *seems* to work well enough on A15 is even more scary.
>>>>> We may well be silently corrupting memory without even noticing (hint,
>>>>> hint...). It is also quite possible that this bug would prevent KVM
>>>>> from running on a CPU which caches TLBs covering VA to PA with a single
>>>>> entry.
>>>>>
>>>>> The fix is to detect aborts that are permission faults, and not stage-1
>>>>> page table walk. We can then resolve the IPA, and store it just like the
>>>>> HPFAR register. If the IPA cannot be resolved, it means another CPU is
>>>>> playing with the page tables, and we simply restart the guest.
>>>>>
>>>>> This also paves the way for lighter Icache invalidation, using the NX bit.
>>>>
>>>> My main concern here is the complicated return code and this assembly
>>>> code is not too easy to read and it duplicates the code in
>>>> __kvm_va_to_pa. This is supposed to be a rare case, because if not,
>>>> we're merging the wrong pages anyway, and the system performance will
>>>> suck.
>>>
>>> This is not exactly rare. There's a number of cases were we fault a page
>>> by reading from it, and then writing to it. A lot of static data in the
>>> kernel causes this, actually.
>>>
>>
>> yes, but unless that page is shared with another process at the point
>> when it's faulted in we get the page back as writable from
>> get_user_pages and map it writable in the stage-2 table from the
>> get-go.
>
> Are you sure? It looks like we only mark it writable if it is a write
> access...
>
So we check the writeable return value for whether we should map it
read-only or writeable, and the KVM code should tell is a page is
writable if it actually is, only __get_user_pages_fast is not defined
on ARM, so you're actually right, we map everything as read-only
that's faulted in on a read.

This shouldn't be anything else than the kernel image though, after
decompression I would be surprised to see many pages faulted in on a
write. If it's something that happens a lot, we should really consider
optimizing it an not take a double stage-2 fault on private writeable
pages.
_______________________________________________
kvmarm mailing list
kvmarm@xxxxxxxxxxxxxxxxxxxxx
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm