RE: [PATCH] PCI: hv: Fix NUMA node assignment when kernel boots with parameters affecting NUMA topology

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: longli@xxxxxxxxxxxxxxxxx <longli@xxxxxxxxxxxxxxxxx> Sent: Thursday, January 6, 2022 3:20 PM
> 
> When the kernel boots with parameters restricting the number of cpus or NUMA
> nodes, e.g. maxcpus=X or numa=off, the vPCI driver should only set to the NUMA
> node to a value that is valid in the current running kernel.
> 
> Signed-off-by: Long Li <longli@xxxxxxxxxxxxx>
> ---
>  drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-
> hyperv.c
> index fc1a29acadbb..8686343eff4c 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1835,8 +1835,21 @@ static void hv_pci_assign_numa_node(struct
> hv_pcibus_device *hbus)
>  		if (!hv_dev)
>  			continue;
> 
> -		if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
> -			set_dev_node(&dev->dev, hv_dev->desc.virtual_numa_node);
> +		if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY) {
> +			int cpu;
> +			bool found_node = false;
> +
> +			for_each_possible_cpu(cpu)
> +				if (cpu_to_node(cpu) ==
> +				    hv_dev->desc.virtual_numa_node) {
> +					found_node = true;
> +					break;
> +				}
> +
> +			if (found_node)
> +				set_dev_node(&dev->dev,
> +					     hv_dev->desc.virtual_numa_node);
> +		}

I'm wondering about this approach vs. just comparing against nr_node_ids.
Comparing against nr_node_ids would handle the case of numa=off on the
kernel boot line, or a kernel built with CONFIG_NUMA=n, or the use of
numa=fake.  Your approach is also affected by which CPUs are online,
since cpu_to_node() references percpu data.  It would seem to produce
more variable results since CPUs can go online and offline while the VM
is running.  If a network VF device was removed and re-added, the results
of your algorithm could be different for the re-add, depending on which
CPUs were online at the time.

My impression (which may be incorrect) is that the device numa_node
is primarily to allow the driver to allocate memory from the closest
NUMA node, and such memory allocations don't really need to be
affected by which CPUs are online.

Thoughts?

> 
>  		put_pcichild(hv_dev);
>  	}
> --
> 2.25.1





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux