On 27.11.19 18:36, Alexander Duyck wrote: > On Wed, 2019-11-27 at 11:01 +0100, David Hildenbrand wrote: >> On 26.11.19 17:45, Alexander Duyck wrote: >>> On Tue, 2019-11-26 at 13:20 +0100, David Hildenbrand wrote: >>>> On 19.11.19 22:46, Alexander Duyck wrote: > > <snip> > >>>>> Below are the results from various benchmarks. I primarily focused on two >>>>> tests. The first is the will-it-scale/page_fault2 test, and the other is >>>>> a modified version of will-it-scale/page_fault1 that was enabled to use >>>>> THP. I did this as it allows for better visibility into different parts >>>>> of the memory subsystem. The guest is running with 32G for RAM on one >>>>> node of a E5-2630 v3. The host has had some power saving features disabled >>>>> by setting the /dev/cpu_dma_latency value to 10ms. >>>>> >>>>> Test page_fault1 (THP) page_fault2 >>>>> Name tasks Process Iter STDEV Process Iter STDEV >>>>> Baseline 1 1203934.75 0.04% 379940.75 0.11% >>>>> 16 8828217.00 0.85% 3178653.00 1.28% >>>>> >>>>> Patches applied 1 1207961.25 0.10% 380852.25 0.25% >>>>> 16 8862373.00 0.98% 3246397.25 0.68% >>>>> >>>>> Patches enabled 1 1207758.75 0.17% 373079.25 0.60% >>>>> MADV disabled 16 8870373.75 0.29% 3204989.75 1.08% >>>>> >>>>> Patches enabled 1 1261183.75 0.39% 373201.50 0.50% >>>>> 16 8371359.75 0.65% 3233665.50 0.84% >>>>> >>>>> Patches enabled 1 1090201.50 0.25% 376967.25 0.29% >>>>> page shuffle 16 8108719.75 0.58% 3218450.25 1.07% >>>>> >>>>> The results above are for a baseline with a linux-next-20191115 kernel, >>>>> that kernel with this patch set applied but page reporting disabled in >>>>> virtio-balloon, patches applied but the madvise disabled by direct >>>>> assigning a device, the patches applied and page reporting fully >>>>> enabled, and the patches enabled with page shuffling enabled. These >>>>> results include the deviation seen between the average value reported here >>>>> versus the high and/or low value. I observed that during the test memory >>>>> usage for the first three tests never dropped whereas with the patches >>>>> fully enabled the VM would drop to using only a few GB of the host's >>>>> memory when switching from memhog to page fault tests. >>>>> >>>>> Most of the overhead seen with this patch set enabled seems due to page >>>>> faults caused by accessing the reported pages and the host zeroing the page >>>>> before giving it back to the guest. This overhead is much more visible when >>>>> using THP than with standard 4K pages. In addition page shuffling seemed to >>>>> increase the amount of faults generated due to an increase in memory churn. >>>> >>>> MADV_FREE would be interesting. >>> >>> I can probably code something up. However that is going to push a bunch of >>> complexity into the QEMU code and doesn't really mean much to the kernel >>> code. I can probably add it as another QEMU patch to the set since it is >>> just a matter of having a function similar to ram_block_discard_range that >>> uses MADV_FREE instead of MADV_DONTNEED. >> >> Yes, addon patch makes perfect sense. The nice thing about MADV_FREE is >> that you only take back pages from a process when really under memory >> pressure (before going to SWAP). You will still get a pagefault on the >> next access (to identify that the page is still in use after all), but >> don't have to fault in a fresh page. > > So I got things running with a proof of concept using MADV_FREE. > Apparently another roadblock I hadn't realized is that you have to have > the right version of glibc for MADV_FREE to be present. > > Anyway with MADV_FREE the numbers actually look pretty close to the > numbers with the madvise disabled. Apparently the page fault overhead > isn't all that significant. When I push the next patch set I will include > the actual numbers, but even with shuffling enabled the results were in > the 8.7 to 8.8 million iteration range. > Cool, thanks for evaluating! -- Thanks, David / dhildenb