RE: [PATCH] PCI: hv: Fix NUMA node assignment when kernel boots with parameters affecting NUMA topology

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Subject: RE: [PATCH] PCI: hv: Fix NUMA node assignment when kernel boots
> with parameters affecting NUMA topology
> 
> From: longli@xxxxxxxxxxxxxxxxx <longli@xxxxxxxxxxxxxxxxx> Sent:
> Thursday, January 6, 2022 3:20 PM
> >
> > When the kernel boots with parameters restricting the number of cpus
> > or NUMA nodes, e.g. maxcpus=X or numa=off, the vPCI driver should only
> > set to the NUMA node to a value that is valid in the current running kernel.
> >
> > Signed-off-by: Long Li <longli@xxxxxxxxxxxxx>
> > ---
> >  drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++++--
> >  1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci- hyperv.c index
> > fc1a29acadbb..8686343eff4c 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -1835,8 +1835,21 @@ static void hv_pci_assign_numa_node(struct
> > hv_pcibus_device *hbus)
> >  		if (!hv_dev)
> >  			continue;
> >
> > -		if (hv_dev->desc.flags &
> HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
> > -			set_dev_node(&dev->dev, hv_dev-
> >desc.virtual_numa_node);
> > +		if (hv_dev->desc.flags &
> HV_PCI_DEVICE_FLAG_NUMA_AFFINITY) {
> > +			int cpu;
> > +			bool found_node = false;
> > +
> > +			for_each_possible_cpu(cpu)
> > +				if (cpu_to_node(cpu) ==
> > +				    hv_dev->desc.virtual_numa_node) {
> > +					found_node = true;
> > +					break;
> > +				}
> > +
> > +			if (found_node)
> > +				set_dev_node(&dev->dev,
> > +					     hv_dev-
> >desc.virtual_numa_node);
> > +		}
> 
> I'm wondering about this approach vs. just comparing against nr_node_ids.

I was trying to fix this by comparing with nr_node_ids. This worked for numa=off, but it didn't work with maxcpus=X.

maxcpus=X is commonly used in kdump kernels. In this config,  the memory system is initialized in a way that only the NUMA nodes within maxcpus are setup and can be used by the drivers.

> Comparing against nr_node_ids would handle the case of numa=off on the
> kernel boot line, or a kernel built with CONFIG_NUMA=n, or the use of
> numa=fake.  Your approach is also affected by which CPUs are online, since
> cpu_to_node() references percpu data.  It would seem to produce more
> variable results since CPUs can go online and offline while the VM is running.
> If a network VF device was removed and re-added, the results of your
> algorithm could be different for the re-add, depending on which CPUs were
> online at the time.
> 
> My impression (which may be incorrect) is that the device numa_node is
> primarily to allow the driver to allocate memory from the closest NUMA node,
> and such memory allocations don't really need to be affected by which CPUs
> are online.

Yes, this is the reason I'm using for_each_possible_cpu(). Even if some CPUs are not online, the memory system is setup in a way that allow driver to allocate memory on that NUMA node. The algorithm guarantees the value of NUMA node is valid when calling set_dev_node().

> 
> Thoughts?
> 
> >
> >  		put_pcichild(hv_dev);
> >  	}
> > --
> > 2.25.1





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux