From: mhkelley58@xxxxxxxxx <mhkelley58@xxxxxxxxx> Sent: Wednesday, October 2, 2024 8:53 PM > > Code specific to Hyper-V guests currently assumes the cpu_possible_mask > is "dense" -- i.e., all bit positions 0 thru (nr_cpu_ids - 1) are set, > with no "holes". Therefore, num_possible_cpus() is assumed to be equal > to nr_cpu_ids. > > Per a separate discussion[1], this assumption is not valid in the > general case. For example, the function setup_nr_cpu_ids() in > kernel/smp.c is coded to assume cpu_possible_mask may be sparse, > and other patches have been made in the past to correctly handle > the sparseness. See bc75e99983df1efd ("rcu: Correctly handle sparse > possible cpu") as noted by Mark Rutland. > > The general case notwithstanding, the configurations that Hyper-V > provides to guest VMs on x86 and ARM64 hardware, in combination > with the algorithms currently used by architecture specific code > to assign Linux CPU numbers, *does* always produce a dense > cpu_possible_mask. So the invalid assumption is not currently > causing failures. But in the interest of correctness, and robustness > against future changes in the code that populates cpu_possible_mask, > update the Hyper-V code to no longer assume denseness. > > The typical code pattern with the invalid assumption is as follows: > > array = kcalloc(num_possible_cpus(), sizeof(<some struct>), > GFP_KERNEL); > .... > index into "array" with smp_processor_id() > > In such as case, the array might be indexed by a value beyond the size > of the array. The correct approach is to allocate the array with size > "nr_cpu_ids". While this will probably leave unused any array entries > corresponding to holes in cpu_possible_mask, the holes are assumed to > be minimal and hence the amount of memory wasted by unused entries is > minimal. > > Removing the assumption in Hyper-V code is done in several patches > because they touch different kernel subsystems: > > Patch 1: Hyper-V x86 initialization of hv_vp_assist_page (there's no > hv_vp_assist_page on ARM64) > Patch 2: Hyper-V common init of hv_vp_index > Patch 3: Hyper-V IOMMU driver > Patch 4: storvsc driver > Patch 5: netvsc driver Wei -- Could you pick up Patches 1, 2, and 3 in this series for the hyperv-next tree? Peter Zijlstra acked the full series [2], and Patches 4 and 5 have already been picked by the SCSI and net maintainers respectively [3][4]. Let me know if you have any concerns. Thanks, Michael [2] https://lore.kernel.org/linux-hyperv/20241004100742.GO18071@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ [3] https://lore.kernel.org/linux-hyperv/yq15xnsjlc1.fsf@xxxxxxxxxxxxxxxxxxxx/ [4] https://lore.kernel.org/linux-hyperv/172808404024.2772330.2975585273609596688.git-patchwork-notify@xxxxxxxxxx/ > > I tested the changes by hacking the construction of cpu_possible_mask > to include a hole on x86. With a configuration set to demonstrate the > problem, a Hyper-V guest kernel eventually crashes due to memory > corruption. After the patches in this series, the crash does not occur. > > [1] https://lore.kernel.org/lkml/SN6PR02MB4157210CC36B2593F8572E5ED4692@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > Michael Kelley (5): > x86/hyperv: Don't assume cpu_possible_mask is dense > Drivers: hv: Don't assume cpu_possible_mask is dense > iommu/hyper-v: Don't assume cpu_possible_mask is dense > scsi: storvsc: Don't assume cpu_possible_mask is dense > hv_netvsc: Don't assume cpu_possible_mask is dense > > arch/x86/hyperv/hv_init.c | 2 +- > drivers/hv/hv_common.c | 4 ++-- > drivers/iommu/hyperv-iommu.c | 4 ++-- > drivers/net/hyperv/netvsc_drv.c | 2 +- > drivers/scsi/storvsc_drv.c | 13 ++++++------- > 5 files changed, 12 insertions(+), 13 deletions(-) > > -- > 2.25.1 >