Hi, Manivannan, On Mon, May 29, 2023 at 1:39 PM Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> wrote: > > On Mon, May 29, 2023 at 10:02:20AM +0800, Huacai Chen wrote: > > Hi, Manivannan, > > > > On Mon, May 29, 2023 at 12:57 AM Manivannan Sadhasivam > > <manivannan.sadhasivam@xxxxxxxxxx> wrote: > > > > > > On Thu, May 25, 2023 at 05:14:28PM +0800, Huacai Chen wrote: > > > > Hi, Bjorn, > > > > > > > > On Wed, May 24, 2023 at 11:21 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > > > > > > [+cc Marc, LKML] > > > > > > > > > > On Wed, May 24, 2023 at 05:36:23PM +0800, Huacai Chen wrote: > > > > > > Some platforms (such as LoongArch) cannot provide enough irq numbers as > > > > > > many as logical cpu numbers. So we should limit pci irq numbers when > > > > > > allocate msi/msix vectors, otherwise some device drivers may fail at > > > > > > initialization. This patch add a cmdline parameter "pci_irq_limit=xxxx" > > > > > > to control the limit. > > > > > > > > > > > > The default pci msi/msix number limit is defined 32 for LoongArch and > > > > > > NR_IRQS for other platforms. > > > > > > > > > > The IRQ experts can chime in on this, but this doesn't feel right to > > > > > me. I assume arch code should set things up so only valid IRQ numbers > > > > > can be allocated. This doesn't seem necessarily PCI-specific, I'd > > > > > prefer to avoid an arch #ifdef here, and I'd also prefer to avoid a > > > > > command-line parameter that users have to discover and supply. > > > > The problem we meet: LoongArch machines can have as many as 256 > > > > logical cpus, and the maximum of msi vectors is 192. Even on a 64-core > > > > machine, 192 irqs can be easily exhausted if there are several NICs > > > > (NIC usually allocates msi irqs depending on the number of online > > > > cpus). So we want to limit the msi allocation. > > > > > > > > > > If the MSI allocation fails with multiple vectors, then the NIC driver should > > > revert to a single MSI vector. Is that happening in your case? > > Thank you for pointing this out. Yes, I know most existing drivers > > will fallback to use single msi or legacy irqs when failed. However, > > as I > > replied in another thread (the new solution of this problem [1]), we > > want to do some proactive throttling rather than consume msi vectors > > aggressively. For example, if we have two NICs, we want both of them > > to get 32 msi vectors; not one exhaust all available vectors, and the > > other fallback to use single msi or legacy irq. > > > > I hope I have explained clearly, thanks. > > > > The problem you are facing is not specific to Loongsoon but rather generic. And > the solution we have currently is what you were also aware of it seems. So if > you want to propose an alternative solution, it should be generic and also a > good justification needs to be provided to the maintainers i.e., comparing two > solutions and why yours is better. Yes, I think we are facing a generic problem, but it is more obvious on platforms which provide less MSI vectors. And my solution seems generic enough. :) At least in my example, "proactive throttling" is better than "aggressive consuming", because two (or more) NICs have more balanced throughput. > > But IMO what you are proposing seems like usecase driven and may not work all > the time due to architecture limitation. This again proves that the existing > solution is sufficient enough. Yes, it's a usecase driven solution, so I provide a cmdline parameter to let the user decide. Huacai > > - Mani > > > [1] https://lore.kernel.org/lkml/20230527054633.704916-1-chenhuacai@xxxxxxxxxxx/T/#t > > > > Huacai > > > > > > - Mani > > > > > > > This is not a LoongArch-specific problem, because I think other > > > > platforms can also meet if they have many NICs. But of course, > > > > LoongArch can meet it more easily because the available msi vectors > > > > are very few. So, adding a cmdline parameter is somewhat reasonable. > > > > > > > > After some investigation, I think it may be possible to modify > > > > drivers/irqchip/irq-loongson-pch-msi.c and override > > > > msi_domain_info::domain_alloc_irqs() to limit msi allocation. However, > > > > doing that need to remove the "static" before > > > > __msi_domain_alloc_irqs(), which means revert > > > > 762687ceb31fc296e2e1406559e8bb5 ("genirq/msi: Make > > > > __msi_domain_alloc_irqs() static"), I don't know whether that is > > > > acceptable. > > > > > > > > If such a revert is not acceptable, it seems that we can only use the > > > > method in this patch. Maybe rename pci_irq_limits to pci_msi_limits is > > > > a little better. > > > > > > > > Huacai > > > > > > > > > > > > > > > Signed-off-by: Juxin Gao <gaojuxin@xxxxxxxxxxx> > > > > > > Signed-off-by: Huacai Chen <chenhuacai@xxxxxxxxxxx> > > > > > > --- > > > > > > drivers/pci/msi/msi.c | 26 +++++++++++++++++++++++++- > > > > > > 1 file changed, 25 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c > > > > > > index ef1d8857a51b..6617381e50e7 100644 > > > > > > --- a/drivers/pci/msi/msi.c > > > > > > +++ b/drivers/pci/msi/msi.c > > > > > > @@ -402,12 +402,34 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, > > > > > > return ret; > > > > > > } > > > > > > > > > > > > +#ifdef CONFIG_LOONGARCH > > > > > > +#define DEFAULT_PCI_IRQ_LIMITS 32 > > > > > > +#else > > > > > > +#define DEFAULT_PCI_IRQ_LIMITS NR_IRQS > > > > > > +#endif > > > > > > + > > > > > > +static int pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > > > > + > > > > > > +static int __init pci_irq_limit(char *str) > > > > > > +{ > > > > > > + get_option(&str, &pci_irq_limits); > > > > > > + > > > > > > + if (pci_irq_limits == 0) > > > > > > + pci_irq_limits = DEFAULT_PCI_IRQ_LIMITS; > > > > > > + > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > +early_param("pci_irq_limit", pci_irq_limit); > > > > > > + > > > > > > int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, > > > > > > struct irq_affinity *affd) > > > > > > { > > > > > > int nvec; > > > > > > int rc; > > > > > > > > > > > > + maxvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > > > + > > > > > > if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0) > > > > > > return -EINVAL; > > > > > > > > > > > > @@ -776,7 +798,9 @@ static bool pci_msix_validate_entries(struct pci_dev *dev, struct msix_entry *en > > > > > > int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, > > > > > > int maxvec, struct irq_affinity *affd, int flags) > > > > > > { > > > > > > - int hwsize, rc, nvec = maxvec; > > > > > > + int hwsize, rc, nvec; > > > > > > + > > > > > > + nvec = clamp_val(maxvec, 0, pci_irq_limits); > > > > > > > > > > > > if (maxvec < minvec) > > > > > > return -ERANGE; > > > > > > -- > > > > > > 2.39.1 > > > > > > > > > > > > -- > > > மணிவண்ணன் சதாசிவம் > > -- > மணிவண்ணன் சதாசிவம்