Hi Michael, On Tue, 7 Jan 2025 00:39:07 +0000 Michael Kelley <mhklinux@xxxxxxxxxxx> wrote: > From: Jacob Pan <jacob.pan@xxxxxxxxxxxxxxxxxxx> Sent: Thursday, > January 2, 2025 1:40 PM > > > > When device memory blocks are hot-added via > > add_memory_driver_managed(), the HV balloon driver intercepts them > > but fails to online these memory blocks. This could result in > > device driver initialization failures. > > > > To address this, fall back to the generic online callback for memory > > blocks not added by the HV balloon driver. Similar cases are > > handled the same way in virtio-mem driver. > > I had a little bit of trouble figuring out what problem this patch is > solving. Is this a correct description? > > The Hyper-V balloon driver installs a custom callback for handling > page onlining operations performed by the memory hotplug subsystem. > This custom callback is global, and overrides the default callback > (generic_online_page) that Linux otherwise uses. The custom > callback properly handles memory that is hot-added by the balloon > driver as part of a Hyper-V hot-add region. > > But memory can also be hot-added directly by a device driver for a > vPCI device, particularly GPUs. In such a case, the custom callback > installed by the balloon driver runs, but won't find the page in its > hot-add region list and doesn't online it, which could cause driver > initialization failures. > > Fix this by having the balloon custom callback run > generic_online_page() when the page isn't part of a Hyper-V hot-add > region, thereby doing the default Linux behavior. This allows device > driver hot-adds to work properly. Similar cases are handled the same > way in the virtio-mem driver. > Yes, your description has the complete story. I was also thinking the generic code could handle this if the custom callback is not a void return type. Then we don't have to duplicate the code in here and virtio-mem, perhaps a separate cleanup effort. > > > > Suggested-by: Vikram Sethi <vsethi@xxxxxxxxxx> > > Tested-by: Michael Frohlich <mfrohlich@xxxxxxxxxxxxx> > > Signed-off-by: Jacob Pan <jacob.pan@xxxxxxxxxxxxxxxxxxx> > > --- > > drivers/hv/hv_balloon.c | 18 ++++++++++-------- > > 1 file changed, 10 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c > > index a99112e6f0b8..c999daf34108 100644 > > --- a/drivers/hv/hv_balloon.c > > +++ b/drivers/hv/hv_balloon.c > > @@ -766,16 +766,18 @@ static void hv_online_page(struct page *pg, > > unsigned int order) > > struct hv_hotadd_state *has; > > unsigned long pfn = page_to_pfn(pg); > > > > - guard(spinlock_irqsave)(&dm_device.ha_lock); > > - list_for_each_entry(has, &dm_device.ha_region_list, list) { > > - /* The page belongs to a different HAS. */ > > - if (pfn < has->start_pfn || > > - (pfn + (1UL << order) > has->end_pfn)) > > - continue; > > + scoped_guard(spinlock_irqsave, &dm_device.ha_lock) { > > + list_for_each_entry(has, > > &dm_device.ha_region_list, list) { > > + /* The page belongs to a different HAS. */ > > + if (pfn < has->start_pfn || > > + (pfn + (1UL << order) > > > has->end_pfn)) > > + continue; > > > > - hv_bring_pgs_online(has, pfn, 1UL << order); > > - break; > > + hv_bring_pgs_online(has, pfn, 1UL << > > order); > > + return; > > + } > > } > > + generic_online_page(pg, order); > > } > > > > static int pfn_covered(unsigned long start_pfn, unsigned long > > pfn_cnt) -- > > 2.34.1 > > > > The code LGTM. I'd suggest updating the commit message along the > lines of what I wrote. My descriptions can be a bit wordy, but > hopefully they add clarity on the overall picture. > > In any case, > > Reviewed-by: Michael Kelley <mhklinux@xxxxxxxxxxx> > Will update the commit message in the next version, thanks a lot. Thanks, Jacob