Re: [PATCH V4 11/12] bnxt_en: Add TPH support in BNXT driver

Wei Huang <wei.huang2@xxxxxxx> · Thu, 19 Sep 2024 11:14:27 -0500

On 9/18/24 12:31, Alejandro Lucero Palau wrote:
> 
> On 9/16/24 19:55, Wei Huang wrote:
>>
>>
>> On 9/11/24 10:37 AM, Alejandro Lucero Palau wrote:
>>>
...
>>>
>>> I understand just one cpu from the mask has to be used, but I wonder if
>>> some check should be done for ensuring the mask is not mad.
>>>
>>> This is control path and the related queue is going to be restarted, so
>>> maybe a sanity check for ensuring all the cpus in the mask are from the
>>> same CCX complex?
>>
>> I don't think this is always true and we shouldn't warn when this
>> happens. There is only one ST can be supported, so the driver need to
>> make a good judgement on which ST to be used. But no matter what, ST
>> is just a hint - it shouldn't cause any correctness issues in HW, even
>> when it is not the optimal target CPU. So warning is unnecessary.
>>
> 
> 1) You can use a "mad" mask for avoiding a specific interrupt to disturb
> a specific execution is those cores not part of the mask. But I argue
> the ST hint should not be set then.
> 
> 
> 2) Someone, maybe an automatic script, could try to get the best
> performance possible, and a "mad" mask could preclude such outcome
> inadvertently.
> 

For this case, you can use the following command:

echo cpu_id > /proc/irq/nnn/smp_affinity_list

where nnn is the MSI IRQ number associated witht the device. This forces
IRQ to be associated with only one specific CPU.

> 
> I agree a warning could not be a good idea because 1, but I would say
> adding some way of traceability here could be interesting. A tracepoint
> or a new ST field for last hint set for that interrupt/queue.

We do have two pci_dbg() in tph.c. You can see the logs with proper
kernel print level. The logs show GET/SET ST values in what PCIe device,
which ST table, and at which index.

> 
> 
>>>
>>> That would be an iteration checking the tag is the same one for all of
>>> them. If not, at least a warning stating the tag/CCX/cpu used.
>>>