"Alex Ng (LIS)" <alexng@xxxxxxxxxxxxx> writes: >> -----Original Message----- >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] >> Sent: Friday, August 5, 2016 3:49 AM >> To: devel@xxxxxxxxxxxxxxxxxxxxxx >> Cc: linux-kernel@xxxxxxxxxxxxxxx; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; >> KY Srinivasan <kys@xxxxxxxxxxxxx>; Alex Ng (LIS) <alexng@xxxxxxxxxxxxx> >> Subject: [PATCH 2/4] Drivers: hv: balloon: account for gaps in hot add regions >> >> I'm observing the following hot add requests from the WS2012 host: >> >> hot_add_req: start_pfn = 0x108200 count = 330752 >> hot_add_req: start_pfn = 0x158e00 count = 193536 >> hot_add_req: start_pfn = 0x188400 count = 239616 >> >> As the host doesn't specify hot add regions we're trying to create 128Mb- >> aligned region covering the first request, we create the 0x108000 - >> 0x160000 region and we add 0x108000 - 0x158e00 memory. The second >> request passes the pfn_covered() check, we enlarge the region to 0x108000 - >> 0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with >> the third request as it starts at 0x188400 so there is a 0x200 gap which is not >> covered. As the end of our region is 0x190000 now it again passes the >> pfn_covered() check were we just adjust the covered_end_pfn and make it >> 0x188400 instead of 0x188200 which means that we'll try to online >> 0x188200-0x188400 pages but these pages were never assigned to us and we >> crash. > > The fact that the host sent a request that's non-contiguous with the previous > request is unexpected. Could we check to see the number of pages we returned > in our response, after each request? > > I'm wondering if we may have given a wrong response to cause the host to > follow-up with a gapped request. It seems it is not the case, here is the recorded session (address format is hex, count is decimal): [ 66.851401] DM: hot_add_req: 108200 303104 0 0 -> we were asked to add 303104 pages ... [ 66.854420] DM: handle_pg_range: 108200 303104 [ 84.489291] DM: handle_pg_range: return 303104 [ 84.492498] DM: hot_add_req: ret 303104 -> and we returned '303104' [ 131.934542] DM: hot_add_req: 152200 221184 0 0 -> we were asked to add 221184 pages ... [ 131.937495] DM: handle_pg_range: 152200 221184 [ 132.720390] DM: handle_pg_range: return 221184 [ 132.722953] DM: hot_add_req: ret 221184 -> and we returned '221184' [ 132.958045] DM: hot_add_req: 188400 409088 0 0 -> and here we were asked to add pages with a gap (0x108200 + 303104 + 221184 = 0x188200 but as you can see the new range starts at 0x188400) [ 132.961409] DM: handle_pg_range: 188400 409088 [ 134.012555] DM: handle_pg_range: return 409088 [ 134.013862] DM: hot_add_req: ret 409088 so I don't see a flaw on Linux side ... -- Vitaly _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel