Re: Linux warns `Unknown NUMA node; performance will be reduced`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/6/11 4:27, Paul Menzel wrote:
> Dear Bjorn,
> 
> 
> Am 10.06.24 um 21:42 schrieb Bjorn Helgaas:
>> [+cc Yunsheng, thread at
>> https://lore.kernel.org/r/a154f694-c48b-4b3b-809f-4b74ec86a924@xxxxxxxxxxxxx]

Thanks for cc'ing.

>>
>> Thanks very much for this report!
> 
> Thank you for the quick reply.
> 
>> On Sun, Jun 09, 2024 at 10:31:05AM +0200, Paul Menzel wrote:
>>> On the servers below Linux warns:
>>>
>>>       Unknown NUMA node; performance will be reduced
>>
>> This warning was added by ad5086108b9f ("PCI: Warn if no host bridge
>> NUMA node info"), which appeared in v5.5, so I assume this isn't new.
>>
>> That commit log says:
>>
>>    In pci_call_probe(), we try to run driver probe functions on the node where
>>    the device is attached.  If we don't know which node the device is attached
>>    to, the driver will likely run on the wrong node.  This will still work,
>>    but performance will not be as good as it could be.
>>
>>    On NUMA systems, warn if we don't know which node a PCI host bridge is
>>    attached to.  This is likely an indication that ACPI didn't supply a _PXM
>>    method or the DT didn't supply a "numa-node-id" property.
>>
>> I assume these are all ACPI systems, so likely missing _PXM.  An
>> acpidump could confirm this.
> 
> I created an issue in the Linux Kernel Bugzilla [1] and attached the output of `acpidump` on a Dell PowerEdge T630 there. The DSDT contains:
> 
>         Device (PCI1)
>         {
>         […]
>             Method (_PXM, 0, NotSerialized)  // _PXM: Device Proximity
>             {
>                 If ((CLOD == 0x00))
>                 {
>                     Return (0x01)
>                 }
>                 Else
>                 {
>                     Return (0x02)
>                 }
>             }
>         […]
>         }
> 
>> I think the devices on buses 7f and ff are Intel chipset devices, and
>> I doubt we have drivers for any of them.  They have vendor/device IDs
>> of 8086:6fXX, and I didn't see any reference to them:
>>
>>    $ git grep -i \<0x6f..\>
>>    $
> 
> Interesting. Any ideas, what these chipset devices do?
> 
>> If we *did* have drivers, they would certainly benefit from having
>> _PXM, but since there are no probe methods, I don't think it matters
>> that we don't know where they should run.
>>
>> Maybe the message should be downgraded from "dev_warn" to "dev_info"
>> since there's no functional problem, and the user can't really do
>> anything about it.
>>
>> We could also consider moving it to the actual probe path, so we don't
>> emit a message unless there is an affected driver.

The problem seems to be how we decide if there is an affected driver?
do we care about the out-of-tree driver? doesn't the out-of-tree driver
suffer from the similar problem if BIOS is not providing the correct
numa info?

The 'Unknown NUMA node; performance will be reduced' warning seems to
be added to give the vendor some pressure to fix the BIOS as fast as
possible, downgrading from "dev_warn" to "dev_info" or moving it to
the actual probe path does not seems to fix the problem, just alliviate
the pressure for vendor to fix the BIOS?


> 
> Both ideas sound good, but I do not know the code at all.
> 
>>> 1.  [    0.000000] DMI: Dell Inc. PowerEdge R730/0H21J3, BIOS 2.13.0 05/14/2021
>>> 2.  [    0.000000] DMI: Dell Inc. PowerEdge R730/0H21J3, BIOS 2.2.5 09/06/2016
>>> 3.  [    0.000000] DMI: Dell Inc. PowerEdge R730xd/0WCJNT, BIOS 2.3.4 11/08/2016
>>> 4.  [    0.000000] DMI: Dell Inc. PowerEdge R910/0KYD3D, BIOS 2.10.0 08/29/2013
>>> 5.  [    0.000000] DMI: Dell Inc. PowerEdge R930/0T55KM, BIOS 2.8.1 01/02/2020
>>> 6.  [    0.000000] DMI: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.5.4 08/17/2017
>>> 7.  [    0.000000] DMI: Dell Inc. PowerEdge T630/0W9WXC, BIOS 1.5.4 10/04/2015
>>> 8.  [    0.000000] DMI: Dell Inc. PowerEdge T630/0W9WXC, BIOS 2.11.0 12/23/2019
>>> 9.  [    0.000000] DMI: Dell Inc. PowerEdge T630/0W9WXC, BIOS 2.1.5 04/13/2016
>>> 10. [    0.000000] DMI: Supermicro Super Server/X13SAE, BIOS 2.0 10/17/2022
>>> ...
>>
>>> 7f:08.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D QPI Link 0 [8086:6f80] (rev 01)
>>> 7f:08.2 Performance counters [1101]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D QPI Link 0 [8086:6f32] (rev 01)
>>> ...
>>
>>> ff:08.0 System peripheral [0880]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D QPI Link 0 [8086:6f80] (rev 01)
>>> ff:08.2 Performance counters [1101]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D QPI Link 0 [8086:6f32] (rev 01)
>>> ...
>>
>>
>>> [    0.000000] DMI: Dell Inc. PowerEdge T630/0NT78X, BIOS 2.4.2 01/09/2017
>>> ...
>>> [    4.398627] ACPI: PCI Root Bridge [UNC1] (domain 0000 [bus ff])
>>> [    4.437865] pci_bus 0000:ff: Unknown NUMA node; performance will be reduced
>>> ...
>>> [    4.901021] ACPI: PCI Root Bridge [UNC0] (domain 0000 [bus 7f])
>>> [    4.940865] pci_bus 0000:7f: Unknown NUMA node; performance will be reduced
> 
> 
> Kind regards,
> 
> Paul
> 
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=218951
> .
> 




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux