On Tue, Nov 13, 2012 at 8:08 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: > On 13/11/12 12:56, Christoffer Dall wrote: >> On Tue, Nov 13, 2012 at 5:43 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: >>> On 12/11/12 22:55, Christoffer Dall wrote: >>>> On Mon, Nov 12, 2012 at 12:02 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: >>>>> This patch fixes a very long standing bug in the KVM fault handling and >>>>> is basically a backport of the equivalent arm64 code. >>>>> >>>>> The ARM ARM is quite explicit in saying the HPFAR cannot be trusted >>>>> for a permission fault, unless it is a Stage-1 page table walk. >>>>> >>>>> The fact that it *seems* to work well enough on A15 is even more scary. >>>>> We may well be silently corrupting memory without even noticing (hint, >>>>> hint...). It is also quite possible that this bug would prevent KVM >>>>> from running on a CPU which caches TLBs covering VA to PA with a single >>>>> entry. >>>>> >>>>> The fix is to detect aborts that are permission faults, and not stage-1 >>>>> page table walk. We can then resolve the IPA, and store it just like the >>>>> HPFAR register. If the IPA cannot be resolved, it means another CPU is >>>>> playing with the page tables, and we simply restart the guest. >>>>> >>>>> This also paves the way for lighter Icache invalidation, using the NX bit. >>>> >>>> My main concern here is the complicated return code and this assembly >>>> code is not too easy to read and it duplicates the code in >>>> __kvm_va_to_pa. This is supposed to be a rare case, because if not, >>>> we're merging the wrong pages anyway, and the system performance will >>>> suck. >>> >>> This is not exactly rare. There's a number of cases were we fault a page >>> by reading from it, and then writing to it. A lot of static data in the >>> kernel causes this, actually. >>> >> >> yes, but unless that page is shared with another process at the point >> when it's faulted in we get the page back as writable from >> get_user_pages and map it writable in the stage-2 table from the >> get-go. > > Are you sure? It looks like we only mark it writable if it is a write > access... > So we check the writeable return value for whether we should map it read-only or writeable, and the KVM code should tell is a page is writable if it actually is, only __get_user_pages_fast is not defined on ARM, so you're actually right, we map everything as read-only that's faulted in on a read. This shouldn't be anything else than the kernel image though, after decompression I would be surprised to see many pages faulted in on a write. If it's something that happens a lot, we should really consider optimizing it an not take a double stage-2 fault on private writeable pages. _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm