Hi, Any one is working on fix for this issue ? Regards, Bharat > -----Original Message----- > From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx] > Sent: Tuesday, September 13, 2016 8:35 PM > To: Marc Zyngier <marc.zyngier@xxxxxxx> > Cc: Bharat Kumar Gogada <bharatku@xxxxxxxxxx>; robh@xxxxxxxxxx; > bhelgaas@xxxxxxxxxx; colin.king@xxxxxxxxxxxxx; Soren Brinkmann > <sorenb@xxxxxxxxxx>; Michal Simek <michals@xxxxxxxxxx>; arnd@xxxxxxxx; > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; Ravikiran Gummaluri <rgummal@xxxxxxxxxx> > Subject: Re: [PATCH 3/3] PCI: Xilinx NWL PCIe: Fix Error for multi function device > for legacy interrupts. > > On Tue, Sep 13, 2016 at 08:41:28AM +0100, Marc Zyngier wrote: > > On 12/09/16 23:02, Bjorn Helgaas wrote: > > > On Thu, Sep 01, 2016 at 05:19:55AM +0000, Bharat Kumar Gogada wrote: > > >>>>>>> Hi Bharat, > > >>>>>>>> @@ -561,7 +561,7 @@ static int > > >>>>>>>> nwl_pcie_init_irq_domain(struct nwl_pcie > > >>>>>>> *pcie) > > >>>>>>>> } > > >>>>>>>> > > >>>>>>>> pcie->legacy_irq_domain = > irq_domain_add_linear(legacy_intc_node, > > >>>>>>>> - INTX_NUM, > > >>>>>>>> + INTX_NUM > > >>>>>>>> + + 1, > > >>>>>>>> &legacy_domain_ops, > > >>>>>>>> pcie); > > >>>>>>> > > >>>>>>> This feels like the wrong thing to do. You have INTX_NUM irqs, > > >>>>>>> so the domain allocation should reflect this. On the other > > >>>>>>> hand, the way the driver currently deals with mappings is > > >>>>>>> quite broken (consistently adding 1 to > > >>>>> the HW interrupt). > > >>>>>>> > > >>>>>> Hi Marc, > > >>>>>> > > >>>>>> Without above change I get following crash in kernel while booting. > > >>>>>> > > >>>>>> [ 2.441684] error: hwirq 0x4 is too large for dummy > > >>>>>> > > >>>>>> [ 2.441694] ------------[ cut here ]------------ > > >>>>>> > > >>>>>> [ 2.441698] WARNING: at kernel/irq/irqdomain.c:344 > > >>>>>> > > >>>>>> [ 2.441702] Modules linked in: > > >>>>>> > > >>>>>> [ 2.441706] > > >>>>>> > > >>>>>> [ 2.441714] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.4.0 #8 > > >>>>>> > > >>>>>> [ 2.441718] Hardware name: xlnx,zynqmp (DT) > > >>>>>> > > >>>>>> [ 2.441723] task: ffffffc071886b80 ti: ffffffc071888000 task.ti: > > >>>>> ffffffc071888000 > > >>>>>> > > >>>>>> [ 2.441732] PC is at irq_domain_associate+0x138/0x1c0 > > >>>>>> > > >>>>>> [ 2.441738] LR is at irq_domain_associate+0x138/0x1c0 > > >>>>>> > > >>>>>> In kernel/irq/irqdomain.c function irq_domain_associate > > >>>>>> > > >>>>>> if (WARN(hwirq >= domain->hwirq_max, > > >>>>>> "error: hwirq 0x%x is too large for %s\n", > > >>>>>> (int)hwirq, domain- > > >>>> name)) > > >>>>>> return -EINVAL; > > >>>>>> > > >>>>>> Here the hwirq and hwirq_max are equal to 4 without the above > > >>>>>> condition > > >>>>> (INTX_NUM + 1) due to which crash is coming. > > >>>>>> This is happening as the legacy interrupts are starting from 1 (INTA). > > >>>>> > > >>>>> I understood that. I'm still persisting in saying that you have the wrong > fix. > > >>>>> > > >>>>> Your domain should always allocate many interrupts as you have > > >>>>> interrupt sources. These interrupts (hwirq) should be numbered > > >>>>> from 0 to (n- > > >>> 1). > > >>>> > > >>>> Agreed, but here comes the problem the hwirq for legacy > > >>>> interrupts will start at 0x1 to 0x4 (INTA to INTD) and these > > >>>> values are as per PCIe specification for legacy interrupts. So > > >>>> these cannot be numbered from 0. So when 0x4 (INTD) for a > > >>>> multi-function device comes the crash occurs. > > >>> > > >>> So who provides this hwirq? Who calls irq_domain_associate() with > > >>> hwirq set to 4? > > >>> > > >> PCIe subsystem invokes pcibios_add_device function in > arch/arm64/kernel/pci.c for every pci device. > > >> The purpose of this function is to assign dev->irq using > of_irq_parse_and_map_pci. > > >> of_irq_parse_and_map_pci invokes of_irq_parse_pci where it reads > > >> PCI_INTERRUPT_PIN from configuration space and saves it in parameter of > struct of_phandle_args. > > >> This structure is passed to irq_create_of_mapping where it invokes > irq_create_fwspec_mapping. > > >> irq_create_fwspec_mapping invokes irq_domain_translate and gets > > >> hwirq, here the above saved PCI_INTERRUPT_PIN value is assigned to hwirq > (*hwirq = fwspec->param[0]). > > >> And then using this hwirq irq_create_mapping -> irq_domain_associate > were invoked and mapping is created for virtual irq with this hwirq. > > >> So for any end point PCI_INTERRUPT_PIN value starts from 0x1 to 0x4 and > so hwirq starts from 0x1 to 0x4. > > >> > > >> So the values are more generic w.r.t to protocol, that's why hwirq will > range from 0x1 to 0x4. > > >> And then if you check pcie-altera.c they are doing this adding one in their > handler and while creating legacy domain. > > > > > > Is this resolved yet? Marc, are you happy, or should we iterate on > > > this again? > > > > Ah, sorry to have dropped the ball on this patch. > > No problem, I wasn't making forward progress anyway. > > > I guess that given that the infrastructure imposes the hwirq range on > > the host drivers, Bharat's approach is the only way (and a number of > > other host drivers are already slightly broken). I'll try and have a > > look at solving this at the generic level. In the meantime: > > > > Acked-by: Marc Zyngier <marc.zyngier@xxxxxxx> > > After looking at this myself, I'm not happy with this either. It feels like there are > bugs lurking here and we're just hiding one of them. > > Here are the callers of irq_domain_add_linear() for legacy INTx in > drivers/pci/host: > > advk_pcie_init_irq_domain LEGACY_IRQ_NUM (4) > dra7xx_pcie_init_irq_domain 4 > ks_dw_pcie_host_init MAX_LEGACY_IRQS (4) > altera_pcie_init_irq_domain INTX_NUM + 1 (5) > nwl_pcie_init_irq_domain INTX_NUM + 1 (5) > xilinx_pcie_init_irq_domain 4 > > I think all of these use the of_irq_parse_and_map_pci() path you mentioned, so > if the problem is in the way that path works, I would think these should *all* be > requesting the same number of interrupts in the domain. > > I agree with Marc that we should request 4 IRQs, because that's what we need. > If we can't do that for some reason, we ought to at least make all these callers > the same. > > Bjorn