On Tue, Nov 13, 2012 at 5:43 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: > On 12/11/12 22:55, Christoffer Dall wrote: >> On Mon, Nov 12, 2012 at 12:02 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: >>> This patch fixes a very long standing bug in the KVM fault handling and >>> is basically a backport of the equivalent arm64 code. >>> >>> The ARM ARM is quite explicit in saying the HPFAR cannot be trusted >>> for a permission fault, unless it is a Stage-1 page table walk. >>> >>> The fact that it *seems* to work well enough on A15 is even more scary. >>> We may well be silently corrupting memory without even noticing (hint, >>> hint...). It is also quite possible that this bug would prevent KVM >>> from running on a CPU which caches TLBs covering VA to PA with a single >>> entry. >>> >>> The fix is to detect aborts that are permission faults, and not stage-1 >>> page table walk. We can then resolve the IPA, and store it just like the >>> HPFAR register. If the IPA cannot be resolved, it means another CPU is >>> playing with the page tables, and we simply restart the guest. >>> >>> This also paves the way for lighter Icache invalidation, using the NX bit. >> >> My main concern here is the complicated return code and this assembly >> code is not too easy to read and it duplicates the code in >> __kvm_va_to_pa. This is supposed to be a rare case, because if not, >> we're merging the wrong pages anyway, and the system performance will >> suck. > > This is not exactly rare. There's a number of cases were we fault a page > by reading from it, and then writing to it. A lot of static data in the > kernel causes this, actually. > yes, but unless that page is shared with another process at the point when it's faulted in we get the page back as writable from get_user_pages and map it writable in the stage-2 table from the get-go. I would say that if this is not rare, and happens a lot, we probably shouldn't be mapping these pages read-only, but that's for another discussion. >> Wouldn't it be nicer to check the fault type in C-code and call the >> __kvm_va_to_pa ? > > I don't find this nicer at all. Yes, it reuses a tiny bit of code, but > the code path becomes very convoluted (fault, save guest, restore host, > "oh wait, this was a permission fault", trap, save host, restore guest, > resolve IPA, restore host, back to square one...) and a lot more expensive. > > Even if it wasn't performance critical, the above dance makes me think > we'd be doing something wrong. > fair enough. >>> >>> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx> >>> --- >>> >From v1: >>> - Use ATS1CPR instead of ATS1CPW >>> >>> arch/arm/kvm/interrupts.S | 37 ++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 36 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S >>> index 5a09e89..b2d52a9 100644 >>> --- a/arch/arm/kvm/interrupts.S >>> +++ b/arch/arm/kvm/interrupts.S >>> @@ -401,12 +401,47 @@ guest_trap: >>> mrc p15, 4, r2, c6, c0, 0 @ HDFAR >>> >>> 2: str r2, [r1, #VCPU_HxFAR] >>> - mrc p15, 4, r2, c6, c0, 4 @ HPFAR >>> + >>> + /* >>> + * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode: >>> + * >>> + * Abort on the stage 2 translation for a memory access from a >>> + * Non-secure PL1 or PL0 mode: >>> + * >>> + * For any Access flag fault or Translation fault, and also for any >>> + * Permission fault on the stage 2 translation of a memory access >>> + * made as part of a translation table walk for a stage 1 translation, >>> + * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR >>> + * is UNKNOWN. >>> + */ >>> + >>> + /* Check for permission fault, and S1PTW */ >>> + mrc p15, 4, r1, c5, c2, 0 @ HSR >>> + and r0, r1, #HSR_FSC_TYPE >>> + cmp r0, #FSC_PERM >>> + tsteq r1, #7 @ S1PTW >>> + mrcne p15, 4, r2, c6, c0, 4 @ HPFAR >>> + bne 3f >>> + >>> + /* Resolve IPA using the xFAR */ >>> + mcr p15, 0, r2, c7, c8, 0 @ ATS1CPR >> >> isn't there a race where guest user mode code writes some address >> which it can write, then we take the stage-2 fault, then another CPU > can't? >> changes the VA mapping to be accessible only in PL1, but user space >> code now gets a mapping to that memory which goes in the TLB and stays >> even after we return from handling the exception? > > It really doesn't matter. We do not grant extra permission to the pages > with Stage-2. When we resume the guest after handling the permission > fault (granting write access), it will evaluate the translation again, > and fault in the guest this time. > > M. > -- > Jazz is not dead. It just smells funny... > _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm