On 04/21/2012 05:15 AM, Mike Waychison wrote: > On Fri, Apr 20, 2012 at 6:56 PM, Takuya Yoshikawa > <takuya.yoshikawa@xxxxxxxxx> wrote: > > On Fri, 20 Apr 2012 16:07:41 -0700 > > Ying Han <yinghan@xxxxxxxxxx> wrote: > > > >> My understanding of the real pain is the poor implementation of the > >> mmu_shrinker. It iterates all the registered mmu_shrink callbacks for > >> each kvm and only does little work at a time while holding two big > >> locks. I learned from mikew@ (also ++cc-ed) that is causing latency > >> spikes and unfairness among kvm instance in some of the experiment > >> we've seen. > > The pains we have with mmu_shrink are twofold: > > - Memory pressure against the shinker applies globally. Any task can > cause pressure within their own environment (using numa or memcg) and > cause the global shrinker to shrink all shadowed tables on the system > (regardless of how memory is isolated between tasks). > - Massive lock contention when all these CPUs are hitting the global > lock (which backs everybody on the system up). > > In our situation, we simple disable the shrinker altogether. My > understanding is that we EPT or NPT, the amount of memory used by > these tables is bounded by the size of guest physical memory, whereas > with software shadowed tables, it is bounded by the addresses spaces > in the guest. There is also a 2% (default) bound enforced on a per-vm basis. > This bound makes it reasonable to not do any reclaim > and charge it as a "system overhead tax". > > As for data, the most impressive result was a massive improvement in > round-trip latency to a webserver running in a guest while another > process on the system was thrashing through page-cache (on a dozen or > so spinning disks iirc). We were using fake-numa, and would otherwise > not expect the antagonist to drastrically affect the latency-sensitive > task (as per a lot of effort into making that work). Unfortunately, > we saw the 99th%ile latency riding at the 140ms timeout cut-off (they > were likely tailing out much longer), with the 95%ile at over 40ms. > With the mmu_shrinker disabled, the 99th%ile latency quickly dropped > down to about 20ms. > > CPU profiles were showing 30% of cpu time wasted on spinlocks, all the > mmu_list_lock iirc. > > In our case, I'm much happier just disabling the damned thing altogether. > There is no mmu_list_lock. Do you mean kvm_lock or kvm->mmu_lock? If the former, then we could easily fix this by dropping kvm_lock while the work is being done. If the latter, then it's more difficult. (kvm_lock being contended implies that mmu_shrink is called concurrently?) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html