On 9/18/24 12:31, Alejandro Lucero Palau wrote: > > On 9/16/24 19:55, Wei Huang wrote: >> >> >> On 9/11/24 10:37 AM, Alejandro Lucero Palau wrote: >>> ... >>> >>> I understand just one cpu from the mask has to be used, but I wonder if >>> some check should be done for ensuring the mask is not mad. >>> >>> This is control path and the related queue is going to be restarted, so >>> maybe a sanity check for ensuring all the cpus in the mask are from the >>> same CCX complex? >> >> I don't think this is always true and we shouldn't warn when this >> happens. There is only one ST can be supported, so the driver need to >> make a good judgement on which ST to be used. But no matter what, ST >> is just a hint - it shouldn't cause any correctness issues in HW, even >> when it is not the optimal target CPU. So warning is unnecessary. >> > > 1) You can use a "mad" mask for avoiding a specific interrupt to disturb > a specific execution is those cores not part of the mask. But I argue > the ST hint should not be set then. > > > 2) Someone, maybe an automatic script, could try to get the best > performance possible, and a "mad" mask could preclude such outcome > inadvertently. > For this case, you can use the following command: echo cpu_id > /proc/irq/nnn/smp_affinity_list where nnn is the MSI IRQ number associated witht the device. This forces IRQ to be associated with only one specific CPU. > > I agree a warning could not be a good idea because 1, but I would say > adding some way of traceability here could be interesting. A tracepoint > or a new ST field for last hint set for that interrupt/queue. We do have two pci_dbg() in tph.c. You can see the logs with proper kernel print level. The logs show GET/SET ST values in what PCIe device, which ST table, and at which index. > > >>> >>> That would be an iteration checking the tag is the same one for all of >>> them. If not, at least a warning stating the tag/CCX/cpu used. >>>