From: David Hildenbrand <david@xxxxxxxxxx> Sent: Thursday, May 2, 2024 12:17 AM > > On 01.05.24 17:14, mhkelley58@xxxxxxxxx wrote: > > From: Michael Kelley <mhklinux@xxxxxxxxxxx> > > > > The Hyper-V balloon driver supports hot-add of memory in addition > > to ballooning. Current code hot-adds in fixed size chunks of > > 128 MiB (fixed constant HA_CHUNK in the code). While this works > > in Hyper-V VMs with 64 GiB or less or memory where the Linux > > memblock size is 128 MiB, the hot-add fails for larger memblock > > sizes because add_memory() expects memory to be added in chunks > > that match the memblock size. Messages like the following are > > reported when Linux has a 256 MiB memblock size: > > > > [ 312.668859] Block size [0x10000000] unaligned hotplug range: > > start 0x310000000, size 0x8000000 > > [ 312.668880] hv_balloon: hot_add memory failed error is -22 > > [ 312.668984] hv_balloon: Memory hot add failed > > > > Larger memblock sizes are usually used in VMs with more than > > 64 GiB of memory, depending on the alignment of the VM's > > physical address space. > > > > Fix this problem by having the Hyper-V balloon driver determine > > the Linux memblock size, and process hot-add requests in that > > chunk size instead of a fixed 128 MiB. Also update the hot-add > > alignment requested of the Hyper-V host to match the memblock > > size. > > > > The code changes look significant, but in fact are just a > > simple text substitution of a new global variable for the > > previous HA_CHUNK constant. No algorithms are changed except > > to initialize the new global variable and to calculate the > > alignment value to pass to Hyper-V. Testing with memblock > > sizes of 256 MiB and 2 GiB shows correct operation. > > > > Signed-off-by: Michael Kelley <mhklinux@xxxxxxxxxxx> > > --- > > Changes in v2: > > * Change new global variable name from ha_chunk_pgs to > > ha_pages_in_chunk [David Hildenbrand] > > * Use kernel macros ALIGN(), ALIGN_DOWN(), and umin() > > to simplify code and reduce references to HA_CHUNK. For > > ease of review, this is done in a new patch preceeding > > this one. [David Hildenbrand] > > > > drivers/hv/hv_balloon.c | 55 +++++++++++++++++++++++++---------------- > > 1 file changed, 34 insertions(+), 21 deletions(-) > > > > diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c > > index 9f45b8a6762c..e0a1a18041ca 100644 > > --- a/drivers/hv/hv_balloon.c > > +++ b/drivers/hv/hv_balloon.c > > @@ -425,11 +425,11 @@ struct dm_info_msg { > > * The range start_pfn : end_pfn specifies the range > > * that the host has asked us to hot add. The range > > * start_pfn : ha_end_pfn specifies the range that we have > > - * currently hot added. We hot add in multiples of 128M > > - * chunks; it is possible that we may not be able to bring > > - * online all the pages in the region. The range > > + * currently hot added. We hot add in chunks equal to the > > + * memory block size; it is possible that we may not be able > > + * to bring online all the pages in the region. The range > > * covered_start_pfn:covered_end_pfn defines the pages that can > > - * be brough online. > > + * be brought online. > > */ > > > > struct hv_hotadd_state { > > @@ -505,8 +505,9 @@ enum hv_dm_state { > > > > static __u8 recv_buffer[HV_HYP_PAGE_SIZE]; > > static __u8 balloon_up_send_buffer[HV_HYP_PAGE_SIZE]; > > +static unsigned long ha_pages_in_chunk; > > + > > #define PAGES_IN_2M (2 * 1024 * 1024 / PAGE_SIZE) > > -#define HA_CHUNK (128 * 1024 * 1024 / PAGE_SIZE) > > > > struct hv_dynmem_device { > > struct hv_device *dev; > > @@ -724,21 +725,21 @@ static void hv_mem_hot_add(unsigned long start, > unsigned long size, > > unsigned long processed_pfn; > > unsigned long total_pfn = pfn_count; > > > > - for (i = 0; i < (size/HA_CHUNK); i++) { > > - start_pfn = start + (i * HA_CHUNK); > > + for (i = 0; i < (size/ha_pages_in_chunk); i++) { > > + start_pfn = start + (i * ha_pages_in_chunk); > > > > scoped_guard(spinlock_irqsave, &dm_device.ha_lock) { > > - has->ha_end_pfn += HA_CHUNK; > > - processed_pfn = umin(total_pfn, HA_CHUNK); > > + has->ha_end_pfn += ha_pages_in_chunk; > > + processed_pfn = umin(total_pfn, ha_pages_in_chunk); > > total_pfn -= processed_pfn; > > - has->covered_end_pfn += processed_pfn; > > + has->covered_end_pfn += processed_pfn; > > } > > > > reinit_completion(&dm_device.ol_waitevent); > > > > nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn)); > > ret = add_memory(nid, PFN_PHYS((start_pfn)), > > - (HA_CHUNK << PAGE_SHIFT), MHP_MERGE_RESOURCE); > > + (ha_pages_in_chunk << PAGE_SHIFT), MHP_MERGE_RESOURCE); > > > > HA_BYTES_IN_CHUNK might be reasonable to have (see below) > > > if (do_hot_add) > > @@ -1807,10 +1808,13 @@ static int balloon_connect_vsp(struct hv_device *dev) > > cap_msg.caps.cap_bits.hot_add = hot_add_enabled(); > > > > /* > > - * Specify our alignment requirements as it relates > > - * memory hot-add. Specify 128MB alignment. > > + * Specify our alignment requirements for memory hot-add. The value is > > + * the log base 2 of the number of megabytes in a chunk. For example, > > + * with 256 MiB chunks, the value is 8. The number of MiB in a chunk > > + * must be a power of 2. > > */ > > - cap_msg.caps.cap_bits.hot_add_alignment = 7; > > + cap_msg.caps.cap_bits.hot_add_alignment = > > + ilog2(ha_pages_in_chunk >> (20 - PAGE_SHIFT)); > > I was wondering if we can remove some of the magic here. Something along > the lines of: > > ilog2(ha_pages_in_chunk / (SZ_1M >> PAGE_SHIFT)) > > or simply > > #define HA_BYTES_IN_CHUNK (ha_pages_in_chunk << PAGE_SHIFT) > > ilog2(HA_BYTES_IN_CHUNK / SZ_1M) > > > Apart from that nothing jumped at me; looks much cleaner. > > Reviewed-by: David Hildenbrand <david@xxxxxxxxxx> > David -- I need to respin anyway because I missed a dependency on CONFIG_MEMORY_HOTPLUG as pointed out by the kernel test robot. I'll add your suggestion to that respin. Michael