On 6/3/19 2:04 PM, Michael S. Tsirkin wrote: > On Mon, Jun 03, 2019 at 01:03:04PM -0400, Nitesh Narayan Lal wrote: >> This patch series proposes an efficient mechanism for communicating free memory >> from a guest to its hypervisor. It especially enables guests with no page cache >> (e.g., nvdimm, virtio-pmem) or with small page caches (e.g., ram > disk) to >> rapidly hand back free memory to the hypervisor. >> This approach has a minimal impact on the existing core-mm infrastructure. > Could you help us compare with Alex's series? > What are the main differences? Results on comparing the benefits/performance of Alexander's v1 (bubble-hinting)[1], Page-Hinting (includes some of the upstream suggested changes on v10) over an unmodified Kernel. Test1 - Number of guests that can be launched without swap usage. Guest size: 5GB Cores: 4 Total NUMA Node Memory ~ 15 GB (All guests are running on a single node) Process: Guest is launched sequentially after running an allocation program with 4GB request. Results: unmodified kernel: 2 guests without swap usage and 3rd guest with a swap usage of 2.3GB. bubble-hinting v1: 4 guests without swap usage and 5th guest with a swap usage of 1MB. Page-hinting: 5 guests without swap usage and 6th guest with a swap usage of 8MB. Test2 - Memhog execution time Guest size: 6GB Cores: 4 Total NUMA Node Memory ~ 15 GB (All guests are running on a single node) Process: 3 guests are launched and "time memhog 6G" is launched in each of them sequentially. Results: unmodified kernel: Guest1-40s, Guest2-1m5s, Guest3-6m38s (swap usage at the end-3.6G) bubble-hinting v1: Guest1-32s, Guest2-58s, Guest3-35s (swap usage at the end-0) Page-hinting: Guest1-42s, Guest2-47s, Guest3-32s (swap usage at the end-0) Test3 - Will-it-scale's page_fault1 Guest size: 6GB Cores: 24 Total NUMA Node Memory ~ 15 GB (All guests are running on a single node) unmodified kernel: tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 1,459168,95.83,459315,95.83,459315 2,956272,91.68,884643,91.72,918630 3,1407811,87.53,1267948,87.69,1377945 4,1755744,83.39,1562471,83.73,1837260 5,2056741,79.24,1812309,80.00,2296575 6,2393759,75.09,2025719,77.02,2755890 7,2754403,70.95,2238180,73.72,3215205 8,2947493,66.81,2369686,70.37,3674520 9,3063579,62.68,2321148,68.84,4133835 10,3229023,58.54,2377596,65.84,4593150 11,3337665,54.40,2429818,64.01,5052465 12,3255140,50.28,2395070,61.63,5511780 13,3260721,46.11,2402644,59.77,5971095 14,3210590,42.02,2390806,57.46,6430410 15,3164811,37.88,2265352,51.39,6889725 16,3144764,33.77,2335028,54.07,7349040 17,3128839,29.63,2328662,49.52,7808355 18,3133344,25.50,2301181,48.01,8267670 19,3135979,21.38,2343003,43.66,8726985 20,3136448,17.27,2306109,40.81,9186300 21,3130324,13.16,2403688,35.84,9645615 22,3109883,9.04,2290808,36.24,10104930 23,3136805,4.94,2263818,35.43,10564245 24,3118949,0.78,2252891,31.03,11023560 bubble-hinting v1: tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 1,292183,95.83,292428,95.83,292428 2,540606,91.67,501887,91.91,584856 3,821748,87.53,735244,88.31,877284 4,1033782,83.38,839925,85.59,1169712 5,1261352,79.25,896464,83.86,1462140 6,1459544,75.12,1050094,80.93,1754568 7,1686537,70.97,1112202,79.23,2046996 8,1866892,66.83,1083571,78.48,2339424 9,2056887,62.72,1101660,77.94,2631852 10,2252955,58.57,1097439,77.36,2924280 11,2413907,54.40,1088583,76.72,3216708 12,2596504,50.35,1117474,76.01,3509136 13,2715338,46.21,1087666,75.32,3801564 14,2861697,42.08,1084692,74.35,4093992 15,2964620,38.02,1087910,73.40,4386420 16,3065575,33.84,1099406,71.07,4678848 17,3107674,29.76,1056948,71.36,4971276 18,3144963,25.71,1094883,70.14,5263704 19,3173468,21.61,1073049,66.21,5556132 20,3173233,17.55,1072417,67.16,5848560 21,3209710,13.37,1079147,65.64,6140988 22,3182958,9.37,1085872,65.95,6433416 23,3200747,5.23,1076414,59.40,6725844 24,3181699,1.04,1051233,65.62,7018272 Page-hinting: tasks,processes,processes_idle,threads,threads_idle,linear 0,0,100,0,100,0 1,467693,95.83,467970,95.83,467970 2,967860,91.68,895883,91.70,935940 3,1408191,87.53,1279602,87.68,1403910 4,1766250,83.39,1557224,83.93,1871880 5,2124689,79.24,1834625,80.35,2339850 6,2413514,75.10,1989557,77.00,2807820 7,2644648,70.95,2158055,73.73,3275790 8,2896483,66.81,2305785,70.85,3743760 9,3157796,62.67,2304083,69.49,4211730 10,3251633,58.53,2379589,66.43,4679700 11,3313704,54.41,2349310,64.76,5147670 12,3285612,50.30,2362013,62.63,5615640 13,3207275,46.17,2377760,59.94,6083610 14,3221727,42.02,2416278,56.70,6551580 15,3194781,37.91,2334552,54.96,7019550 16,3211818,33.78,2399077,52.75,7487520 17,3172664,29.65,2337660,50.27,7955490 18,3177152,25.49,2349721,47.02,8423460 19,3149924,21.36,2319286,40.16,8891430 20,3166910,17.30,2279719,43.23,9359400 21,3159464,13.19,2342849,34.84,9827370 22,3167091,9.06,2285156,37.97,10295340 23,3174137,4.96,2365448,33.74,10763310 24,3161629,0.86,2253813,32.38,11231280 Test4: Netperf Guest size: 5GB Cores: 4 Total NUMA Node Memory ~ 15 GB (All guests are running on a single node) Netserver: Running on core 0 Netperf: Running on core 1 Recv Socket Size bytes: 131072 Send Socket Size bytes:16384 Send Message Size bytes:1000000000 Time: 900s Process: netperf is run 3 times sequentially in the same guest with the same inputs mentioned above and throughput (10^6bits/sec) is observed. unmodified kernel: 1st Run-14769.60, 2nd Run-14849.18, 3rd Run-14842.02 bubble-hinting v1: 1st Run-13441.77, 2nd Run-13487.81, 3rd Run-13503.87 Page-hinting: 1st Run-14308.20, 2nd Run-14344.36, 3rd Run-14450.07 Drawback with bubble-hinting: More invasive. Drawback with page-hinting: Additional bitmap required, including growing/shrinking the bitmap on memory hotplug. [1] https://lkml.org/lkml/2019/6/19/926 >> Measurement results (measurement details appended to this email): >> * With active page hinting, 3 more guests could be launched each of 5 GB(total >> 5 vs. 2) on a 15GB (single NUMA) system without swapping. >> * With active page hinting, on a system with 15 GB of (single NUMA) memory and >> 4GB of swap, the runtime of "memhog 6G" in 3 guests (run sequentially) resulted >> in the last invocation to only need 37s compared to 3m35s without page hinting. >> >> This approach tracks all freed pages of the order MAX_ORDER - 2 in bitmaps. >> A new hook after buddy merging is used to set the bits in the bitmap. >> Currently, the bits are only cleared when pages are hinted, not when pages are >> re-allocated. >> >> Bitmaps are stored on a per-zone basis and are protected by the zone lock. A >> workqueue asynchronously processes the bitmaps as soon as a pre-defined memory >> threshold is met, trying to isolate and report pages that are still free. >> >> The isolated pages are reported via virtio-balloon, which is responsible for >> sending batched pages to the host synchronously. Once the hypervisor processed >> the hinting request, the isolated pages are returned back to the buddy. >> >> The key changes made in this series compared to v9[1] are: >> * Pages only in the chunks of "MAX_ORDER - 2" are reported to the hypervisor to >> not break up the THP. >> * At a time only a set of 16 pages can be isolated and reported to the host to >> avoids any false OOMs. >> * page_hinting.c is moved under mm/ from virt/kvm/ as the feature is dependent >> on virtio and not on KVM itself. This would enable any other hypervisor to use >> this feature by implementing virtio devices. >> * The sysctl variable is replaced with a virtio-balloon parameter to >> enable/disable page-hinting. >> >> Pending items: >> * Test device assigned guests to ensure that hinting doesn't break it. >> * Follow up on VIRTIO_BALLOON_F_PAGE_POISON's device side support. >> * Compare reporting free pages via vring with vhost. >> * Decide between MADV_DONTNEED and MADV_FREE. >> * Look into memory hotplug, more efficient locking, possible races when >> disabling. >> * Come up with proper/traceable error-message/logs. >> * Minor reworks and simplifications (e.g., virtio protocol). >> >> Benefit analysis: >> 1. Use-case - Number of guests that can be launched without swap usage >> NUMA Nodes = 1 with 15 GB memory >> Guest Memory = 5 GB >> Number of cores in guest = 1 >> Workload = test allocation program allocates 4GB memory, touches it via memset >> and exits. >> Procedure = >> The first guest is launched and once its console is up, the test allocation >> program is executed with 4 GB memory request (Due to this the guest occupies >> almost 4-5 GB of memory in the host in a system without page hinting). Once >> this program exits at that time another guest is launched in the host and the >> same process is followed. It is continued until the swap is not used. >> >> Results: >> Without hinting = 3, swap usage at the end 1.1GB. >> With hinting = 5, swap usage at the end 0. >> >> 2. Use-case - memhog execution time >> Guest Memory = 6GB >> Number of cores = 4 >> NUMA Nodes = 1 with 15 GB memory >> Process: 3 Guests are launched and the ‘memhog 6G’ execution time is monitored >> one after the other in each of them. >> Without Hinting - Guest1:47s, Guest2:53s, Guest3:3m35s, End swap usage: 3.5G >> With Hinting - Guest1:40s, Guest2:44s, Guest3:37s, End swap usage: 0 >> >> Performance analysis: >> 1. will-it-scale's page_faul1: >> Guest Memory = 6GB >> Number of cores = 24 >> >> Without Hinting: >> tasks,processes,processes_idle,threads,threads_idle,linear >> 0,0,100,0,100,0 >> 1,315890,95.82,317633,95.83,317633 >> 2,570810,91.67,531147,91.94,635266 >> 3,826491,87.54,713545,88.53,952899 >> 4,1087434,83.40,901215,85.30,1270532 >> 5,1277137,79.26,916442,83.74,1588165 >> 6,1503611,75.12,1113832,79.89,1905798 >> 7,1683750,70.99,1140629,78.33,2223431 >> 8,1893105,66.85,1157028,77.40,2541064 >> 9,2046516,62.50,1179445,76.48,2858697 >> 10,2291171,58.57,1209247,74.99,3176330 >> 11,2486198,54.47,1217265,75.13,3493963 >> 12,2656533,50.36,1193392,74.42,3811596 >> 13,2747951,46.21,1185540,73.45,4129229 >> 14,2965757,42.09,1161862,72.20,4446862 >> 15,3049128,37.97,1185923,72.12,4764495 >> 16,3150692,33.83,1163789,70.70,5082128 >> 17,3206023,29.70,1174217,70.11,5399761 >> 18,3211380,25.62,1179660,69.40,5717394 >> 19,3202031,21.44,1181259,67.28,6035027 >> 20,3218245,17.35,1196367,66.75,6352660 >> 21,3228576,13.26,1129561,66.74,6670293 >> 22,3207452,9.15,1166517,66.47,6987926 >> 23,3153800,5.09,1172877,61.57,7305559 >> 24,3184542,0.99,1186244,58.36,7623192 >> >> With Hinting: >> 0,0,100,0,100,0 >> 1,306737,95.82,305130,95.78,306737 >> 2,573207,91.68,530453,91.92,613474 >> 3,810319,87.53,695281,88.58,920211 >> 4,1074116,83.40,880602,85.48,1226948 >> 5,1308283,79.26,1109257,81.23,1533685 >> 6,1501987,75.12,1093661,80.19,1840422 >> 7,1695300,70.99,1104207,79.03,2147159 >> 8,1901523,66.85,1193613,76.90,2453896 >> 9,2051288,62.73,1200913,76.22,2760633 >> 10,2275771,58.60,1192992,75.66,3067370 >> 11,2435016,54.48,1191472,74.66,3374107 >> 12,2623114,50.35,1196911,74.02,3680844 >> 13,2766071,46.22,1178589,73.02,3987581 >> 14,2932163,42.10,1166414,72.96,4294318 >> 15,3000853,37.96,1177177,72.62,4601055 >> 16,3113738,33.85,1165444,70.54,4907792 >> 17,3132135,29.77,1165055,68.51,5214529 >> 18,3175121,25.69,1166969,69.27,5521266 >> 19,3205490,21.61,1159310,65.65,5828003 >> 20,3220855,17.52,1171827,62.04,6134740 >> 21,3182568,13.48,1138918,65.05,6441477 >> 22,3130543,9.30,1128185,60.60,6748214 >> 23,3087426,5.15,1127912,55.36,7054951 >> 24,3099457,1.04,1176100,54.96,7361688 >> >> [1] https://lkml.org/lkml/2019/3/6/413 >> -- Regards Nitesh
Attachment:
signature.asc
Description: OpenPGP digital signature