On Thu, Sep 12, 2013 at 06:01:37PM -0500, Alexander Graf wrote: > > On 05.08.2013, at 23:27, Paul Mackerras wrote: > > > Currently we request write access to all pages that get mapped into the > > guest, even if the guest is only loading from the page. This reduces > > the effectiveness of KSM because it means that we unshare every page we > > access. Also, we always set the changed (C) bit in the guest HPTE if > > it allows writing, even for a guest load. > > > > This fixes both these problems. We pass an 'iswrite' flag to the > > mmu.xlate() functions and to kvmppc_mmu_map_page() to indicate whether > > the access is a load or a store. The mmu.xlate() functions now only > > set C for stores. kvmppc_gfn_to_pfn() now calls gfn_to_pfn_prot() > > instead of gfn_to_pfn() so that it can indicate whether we need write > > access to the page, and get back a 'writable' flag to indicate whether > > the page is writable or not. If that 'writable' flag is clear, we then > > make the host HPTE read-only even if the guest HPTE allowed writing. > > > > This means that we can get a protection fault when the guest writes to a > > page that it has mapped read-write but which is read-only on the host > > side (perhaps due to KSM having merged the page). Thus we now call > > kvmppc_handle_pagefault() for protection faults as well as HPTE not found > > faults. In kvmppc_handle_pagefault(), if the access was allowed by the > > guest HPTE and we thus need to install a new host HPTE, we then need to > > remove the old host HPTE if there is one. This is done with a new > > function, kvmppc_mmu_unmap_page(), which uses kvmppc_mmu_pte_vflush() to > > find and remove the old host HPTE. > > Have you measured how much performance we lose by mapping it twice? Usually Linux will mark user pages that are not written to yet as non-writable, no? That's why I assumed that "may_write" is the same as "guest wants to write" back when I wrote this. Anonymous user pages start out both writable and dirty, so I think it's OK. > I'm also afraid that a sequence like > > ld x,y > std x,y > > in the kernel will trap twice and slow us down heavily. But maybe I'm just being paranoid. Can you please measure bootup time with and without this, as well as a fork bomb (spawn /bin/echo 1000 times and time it) with and without so we get a feeling for its impact? OK, I can do that. If a page is actually writable but the guest is only asking for read access, we give it write access on the first fault, so I don't expect to see any slowdown. We would get the second fault mainly when KSM has decided to share the underlying page, and there we do need the second fault in order to do the copy-on-write. Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html