On Mon, Jun 03, 2019 at 01:03:04PM -0400, Nitesh Narayan Lal wrote: > This patch series proposes an efficient mechanism for communicating free memory > from a guest to its hypervisor. It especially enables guests with no page cache > (e.g., nvdimm, virtio-pmem) or with small page caches (e.g., ram > disk) to > rapidly hand back free memory to the hypervisor. > This approach has a minimal impact on the existing core-mm infrastructure. Could you help us compare with Alex's series? What are the main differences? > Measurement results (measurement details appended to this email): > * With active page hinting, 3 more guests could be launched each of 5 GB(total > 5 vs. 2) on a 15GB (single NUMA) system without swapping. > * With active page hinting, on a system with 15 GB of (single NUMA) memory and > 4GB of swap, the runtime of "memhog 6G" in 3 guests (run sequentially) resulted > in the last invocation to only need 37s compared to 3m35s without page hinting. > > This approach tracks all freed pages of the order MAX_ORDER - 2 in bitmaps. > A new hook after buddy merging is used to set the bits in the bitmap. > Currently, the bits are only cleared when pages are hinted, not when pages are > re-allocated. > > Bitmaps are stored on a per-zone basis and are protected by the zone lock. A > workqueue asynchronously processes the bitmaps as soon as a pre-defined memory > threshold is met, trying to isolate and report pages that are still free. > > The isolated pages are reported via virtio-balloon, which is responsible for > sending batched pages to the host synchronously. Once the hypervisor processed > the hinting request, the isolated pages are returned back to the buddy. > > The key changes made in this series compared to v9[1] are: > * Pages only in the chunks of "MAX_ORDER - 2" are reported to the hypervisor to > not break up the THP. > * At a time only a set of 16 pages can be isolated and reported to the host to > avoids any false OOMs. > * page_hinting.c is moved under mm/ from virt/kvm/ as the feature is dependent > on virtio and not on KVM itself. This would enable any other hypervisor to use > this feature by implementing virtio devices. > * The sysctl variable is replaced with a virtio-balloon parameter to > enable/disable page-hinting. > > Pending items: > * Test device assigned guests to ensure that hinting doesn't break it. > * Follow up on VIRTIO_BALLOON_F_PAGE_POISON's device side support. > * Compare reporting free pages via vring with vhost. > * Decide between MADV_DONTNEED and MADV_FREE. > * Look into memory hotplug, more efficient locking, possible races when > disabling. > * Come up with proper/traceable error-message/logs. > * Minor reworks and simplifications (e.g., virtio protocol). > > Benefit analysis: > 1. Use-case - Number of guests that can be launched without swap usage > NUMA Nodes = 1 with 15 GB memory > Guest Memory = 5 GB > Number of cores in guest = 1 > Workload = test allocation program allocates 4GB memory, touches it via memset > and exits. > Procedure = > The first guest is launched and once its console is up, the test allocation > program is executed with 4 GB memory request (Due to this the guest occupies > almost 4-5 GB of memory in the host in a system without page hinting). Once > this program exits at that time another guest is launched in the host and the > same process is followed. It is continued until the swap is not used. > > Results: > Without hinting = 3, swap usage at the end 1.1GB. > With hinting = 5, swap usage at the end 0. > > 2. Use-case - memhog execution time > Guest Memory = 6GB > Number of cores = 4 > NUMA Nodes = 1 with 15 GB memory > Process: 3 Guests are launched and the ‘memhog 6G’ execution time is monitored > one after the other in each of them. > Without Hinting - Guest1:47s, Guest2:53s, Guest3:3m35s, End swap usage: 3.5G > With Hinting - Guest1:40s, Guest2:44s, Guest3:37s, End swap usage: 0 > > Performance analysis: > 1. will-it-scale's page_faul1: > Guest Memory = 6GB > Number of cores = 24 > > Without Hinting: > tasks,processes,processes_idle,threads,threads_idle,linear > 0,0,100,0,100,0 > 1,315890,95.82,317633,95.83,317633 > 2,570810,91.67,531147,91.94,635266 > 3,826491,87.54,713545,88.53,952899 > 4,1087434,83.40,901215,85.30,1270532 > 5,1277137,79.26,916442,83.74,1588165 > 6,1503611,75.12,1113832,79.89,1905798 > 7,1683750,70.99,1140629,78.33,2223431 > 8,1893105,66.85,1157028,77.40,2541064 > 9,2046516,62.50,1179445,76.48,2858697 > 10,2291171,58.57,1209247,74.99,3176330 > 11,2486198,54.47,1217265,75.13,3493963 > 12,2656533,50.36,1193392,74.42,3811596 > 13,2747951,46.21,1185540,73.45,4129229 > 14,2965757,42.09,1161862,72.20,4446862 > 15,3049128,37.97,1185923,72.12,4764495 > 16,3150692,33.83,1163789,70.70,5082128 > 17,3206023,29.70,1174217,70.11,5399761 > 18,3211380,25.62,1179660,69.40,5717394 > 19,3202031,21.44,1181259,67.28,6035027 > 20,3218245,17.35,1196367,66.75,6352660 > 21,3228576,13.26,1129561,66.74,6670293 > 22,3207452,9.15,1166517,66.47,6987926 > 23,3153800,5.09,1172877,61.57,7305559 > 24,3184542,0.99,1186244,58.36,7623192 > > With Hinting: > 0,0,100,0,100,0 > 1,306737,95.82,305130,95.78,306737 > 2,573207,91.68,530453,91.92,613474 > 3,810319,87.53,695281,88.58,920211 > 4,1074116,83.40,880602,85.48,1226948 > 5,1308283,79.26,1109257,81.23,1533685 > 6,1501987,75.12,1093661,80.19,1840422 > 7,1695300,70.99,1104207,79.03,2147159 > 8,1901523,66.85,1193613,76.90,2453896 > 9,2051288,62.73,1200913,76.22,2760633 > 10,2275771,58.60,1192992,75.66,3067370 > 11,2435016,54.48,1191472,74.66,3374107 > 12,2623114,50.35,1196911,74.02,3680844 > 13,2766071,46.22,1178589,73.02,3987581 > 14,2932163,42.10,1166414,72.96,4294318 > 15,3000853,37.96,1177177,72.62,4601055 > 16,3113738,33.85,1165444,70.54,4907792 > 17,3132135,29.77,1165055,68.51,5214529 > 18,3175121,25.69,1166969,69.27,5521266 > 19,3205490,21.61,1159310,65.65,5828003 > 20,3220855,17.52,1171827,62.04,6134740 > 21,3182568,13.48,1138918,65.05,6441477 > 22,3130543,9.30,1128185,60.60,6748214 > 23,3087426,5.15,1127912,55.36,7054951 > 24,3099457,1.04,1176100,54.96,7361688 > > [1] https://lkml.org/lkml/2019/3/6/413 >