[Xen-devel] [PATCH] turn off writable page tables

zach at vmware.com (Zachary Amsden) · Fri, 28 Jul 2006 14:36:41 -0700

Keir Fraser wrote:
>
> On 28 Jul 2006, at 16:51, Ian Pratt wrote:
>
>>> So, in summary, we know writable page tables are not broken, they just
>>> don't help on typical workloads because the PTEs/page are so low.
>>> However, they do hurt SMP guest performance.  If we are not seeing a
>>> benefit today, should we turn it off?  Should we make it a compile
>> time
>>> option, with the default off?
>>
>> I wouldn't mind seeing wrpt removed altogether, or at least emulation
>> made the compile time default for the moment. There's bound to be some
>> workload that bites us in the future which is why batching updates on
>> the fork path mightn't be a bad thing if it can be done without too much
>> gratuitous hacking of linux core code.
>
> My only fear is that batched wrpt has some guest-visible effects. For 
> example, the guest has to be able to cope with seeing page directory 
> entries with the present bit cleared. Also, on SMP, it has to be able 
> to cope with spurious page faults anywhere in its address space (e.g., 
> faults on a unhooked page table which some other VCPU has rehooked by 
> the time the Xen pagefault handler runs, hence the fault is bounced 
> back to the guest even though there is no work to be done). If we turn 
> off batched wrpt then guests will not be tested against it and we are 
> likely to hit problems if we ever want to turn it back on again -- 
> we'll find that some guests are not able to correctly handle the weird 
> side effects.
>
> On the other hand, perhaps we can find a neater more explicit 
> alternative to batched wrpt in future.

This is a very nice win for shadow page tables on SMP.  Basically, we 
use the lazy state information to defer all the MMU hypercalls into a 
single flush, which happens when leaving lazy MMU mode.

At the PT level, this can be done without gratuitous hacking of linux 
core code.  However, this can not be extended safely to also encompass 
the set of the parent page directory entry for SMP.  It is a little 
unclear exactly how this would work under a direct page table hypervisor 
- would you still take the faults, or would you re-type and reprotect 
the pages first?  In the fork case, there can be two page tables being 
updated because of COW, but re-typing both pages changes the crossover 
point for when the batching will be a win.  But if the same hooks can be 
used for direct mode, it makes sense to figure that out now so we don't 
have to add 4 different sets of hooks to Linux (UP / SMP want slightly 
different batching models, as might also shadow/direct).

The PDE p-bit going missing is still a problem, and Linux can be changed 
to deal with that - but it is messy.

One remaining issue for deferring direct page table updates is the read 
hazard potential.  I believe there is only one read hazard in the Linux 
mm code that has the potential to be exposed here - the explicit, rather 
than implicit batching makes it quite a bit easier to reason about that.

Zach
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lazy-mmu-batching
Url: http://lists.osdl.org/pipermail/virtualization/attachments/20060728/b10ddee7/attachment.bat