On Fri, Apr 17, 2015 at 5:45 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: > On 17/04/15 13:37, Duc Dang wrote: >> On Fri, Apr 17, 2015 at 3:17 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: >>> On 17/04/15 11:00, Duc Dang wrote: >>>> On Wed, Apr 15, 2015 at 1:16 AM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: >>>>> On Tue, 14 Apr 2015 19:20:19 +0100 >>>>> Duc Dang <dhdang@xxxxxxx> wrote: >>>>> >>>>>> On Sat, Apr 11, 2015 at 5:06 AM, Marc Zyngier <marc.zyngier@xxxxxxx> >>>>>> wrote: >>>>>>> On 2015-04-11 00:42, Duc Dang wrote: >>>>>>>> >>>>>>>> On Fri, Apr 10, 2015 at 10:20 AM, Marc Zyngier >>>>>>>> <marc.zyngier@xxxxxxx> wrote: >>>>>>>>> >>>>>>>>> On 09/04/15 18:05, Duc Dang wrote: >>>>>>>>>> >>>>>>>>>> X-Gene v1 SoC supports total 2688 MSI/MSIX vectors coalesced into >>>>>>>>>> 16 HW IRQ lines. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Duc Dang <dhdang@xxxxxxx> >>>>>>>>>> Signed-off-by: Tanmay Inamdar <tinamdar@xxxxxxx> >>>>>>>>>> --- >>>>>>>>>> drivers/pci/host/Kconfig | 6 + >>>>>>>>>> drivers/pci/host/Makefile | 1 + >>>>>>>>>> drivers/pci/host/pci-xgene-msi.c | 407 >>>>>>>>>> +++++++++++++++++++++++++++++++++++++++ >>>>>>>>>> drivers/pci/host/pci-xgene.c | 21 ++ >>>>>>>>>> 4 files changed, 435 insertions(+) >>>>>>>>>> create mode 100644 drivers/pci/host/pci-xgene-msi.c >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig >>>>>>>>>> index 7b892a9..c9b61fa 100644 >>>>>>>>>> --- a/drivers/pci/host/Kconfig >>>>>>>>>> +++ b/drivers/pci/host/Kconfig >>>>>>>>>> @@ -89,11 +89,17 @@ config PCI_XGENE >>>>>>>>>> depends on ARCH_XGENE >>>>>>>>>> depends on OF >>>>>>>>>> select PCIEPORTBUS >>>>>>>>>> + select PCI_MSI_IRQ_DOMAIN if PCI_MSI >>>>>>>>>> + select PCI_XGENE_MSI if PCI_MSI >>>>>>>>>> help >>>>>>>>>> Say Y here if you want internal PCI support on APM >>>>>>>>>> X-Gene SoC. There are 5 internal PCIe ports available. Each port >>>>>>>>>> is GEN3 capable >>>>>>>>>> and have varied lanes from x1 to x8. >>>>>>>>>> >>>>>>>>>> +config PCI_XGENE_MSI >>>>>>>>>> + bool "X-Gene v1 PCIe MSI feature" >>>>>>>>>> + depends on PCI_XGENE && PCI_MSI >>>>>>>>>> + >>>>>>>>>> config PCI_LAYERSCAPE >>>>>>>>>> bool "Freescale Layerscape PCIe controller" >>>>>>>>>> depends on OF && ARM >>>>>>>>>> diff --git a/drivers/pci/host/Makefile >>>>>>>>>> b/drivers/pci/host/Makefile index e61d91c..f39bde3 100644 >>>>>>>>>> --- a/drivers/pci/host/Makefile >>>>>>>>>> +++ b/drivers/pci/host/Makefile >>>>>>>>>> @@ -11,5 +11,6 @@ obj-$(CONFIG_PCIE_SPEAR13XX) += >>>>>>>>>> pcie-spear13xx.o obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone-dw.o >>>>>>>>>> pci-keystone.o obj-$(CONFIG_PCIE_XILINX) += pcie-xilinx.o >>>>>>>>>> obj-$(CONFIG_PCI_XGENE) += pci-xgene.o >>>>>>>>>> +obj-$(CONFIG_PCI_XGENE_MSI) += pci-xgene-msi.o >>>>>>>>>> obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o >>>>>>>>>> obj-$(CONFIG_PCI_VERSATILE) += pci-versatile.o >>>>>>>>>> diff --git a/drivers/pci/host/pci-xgene-msi.c >>>>>>>>>> b/drivers/pci/host/pci-xgene-msi.c >>>>>>>>>> new file mode 100644 >>>>>>>>>> index 0000000..4f0ff42 >>>>>>>>>> --- /dev/null >>>>>>>>>> +++ b/drivers/pci/host/pci-xgene-msi.c >>>>>>>>>> @@ -0,0 +1,407 @@ >>>>>>>>>> +/* >>>>>>>>>> + * APM X-Gene MSI Driver >>>>>>>>>> + * >>>>>>>>>> + * Copyright (c) 2014, Applied Micro Circuits Corporation >>>>>>>>>> + * Author: Tanmay Inamdar <tinamdar@xxxxxxx> >>>>>>>>>> + * Duc Dang <dhdang@xxxxxxx> >>>>>>>>>> + * >>>>>>>>>> + * This program is free software; you can redistribute it >>>>>>>>>> and/or modify it >>>>>>>>>> + * under the terms of the GNU General Public License as >>>>>>>>>> published by the >>>>>>>>>> + * Free Software Foundation; either version 2 of the License, >>>>>>>>>> or (at your >>>>>>>>>> + * option) any later version. >>>>>>>>>> + * >>>>>>>>>> + * This program is distributed in the hope that it will be >>>>>>>>>> useful, >>>>>>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty >>>>>>>>>> of >>>>>>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>>>>>>>>> + * GNU General Public License for more details. >>>>>>>>>> + */ >>>>>>>>>> +#include <linux/interrupt.h> >>>>>>>>>> +#include <linux/module.h> >>>>>>>>>> +#include <linux/msi.h> >>>>>>>>>> +#include <linux/of_irq.h> >>>>>>>>>> +#include <linux/pci.h> >>>>>>>>>> +#include <linux/platform_device.h> >>>>>>>>>> +#include <linux/of_pci.h> >>>>>>>>>> + >>>>>>>>>> +#define MSI_INDEX0 0x000000 >>>>>>>>>> +#define MSI_INT0 0x800000 >>>>>>>>>> + >>>>>>>>>> +struct xgene_msi_settings { >>>>>>>>>> + u32 index_per_group; >>>>>>>>>> + u32 irqs_per_index; >>>>>>>>>> + u32 nr_msi_vec; >>>>>>>>>> + u32 nr_hw_irqs; >>>>>>>>>> +}; >>>>>>>>>> + >>>>>>>>>> +struct xgene_msi { >>>>>>>>>> + struct device_node *node; >>>>>>>>>> + struct msi_controller mchip; >>>>>>>>>> + struct irq_domain *domain; >>>>>>>>>> + struct xgene_msi_settings *settings; >>>>>>>>>> + u32 msi_addr_lo; >>>>>>>>>> + u32 msi_addr_hi; >>>>>>>>> >>>>>>>>> >>>>>>>>> I'd rather see the mailbox address directly, and only do the >>>>>>>>> split when assigning it to the message (you seem to play all kind >>>>>>>>> of tricks on the address anyway). >>>>>>>> >>>>>>>> >>>>>>>> msi_addr_lo and msi_addr_hi store the physical base address of MSI >>>>>>>> controller registers. I will add comment to clarify this. >>>>>>> >>>>>>> >>>>>>> What I mean is that there is no point in keeping this around as a >>>>>>> pair of 32bit variables. You'd better keep it as a single 64bit, >>>>>>> and do the split when assigning it the the MSI message. >>>>>> >>>>>> Hi Marc, >>>>>> >>>>>> These came from device-tree (which describes 64-bit address number as >>>>>> 2 32-bit words). >>>>> >>>>> ... and converted to a resource as a 64bit word, on which you apply >>>>> {upper,lower}_32_bit(). So much for DT... >>>>> >>>>>> If I store them this way, I don't need CPU cycles to do the split >>>>>> every time assigning them to the MSI message. Please let me know what >>>>>> do you think about it. >>>>> >>>>> This is getting absolutely silly. >>>>> >>>>> How many cycles does it take to execute "lsr x1, x0, #32" on X-Gene? If >>>>> it takes so long that it is considered to be a bottleneck, I suggest >>>>> you go and design a better CPU (hint: the answer is probably 1 cycle >>>>> absolutely everywhere). >>>>> >>>>> How often are you configuring MSIs in the face of what is happening in >>>>> the rest of the kernel? Almost never! >>>>> >>>>> So, given that "never" times 1 is still never, I'll consider that >>>>> readability of the code trumps it anytime (I can't believe we're having >>>>> that kind of conversation...). >>>>> >>>> I changed to use u64 for msi_addr and split it when composing MSI messages. >>>> The change is in v4 of the patch set that I just posted. >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>>>> +static int xgene_msi_set_affinity(struct irq_data *irq_data, >>>>>>>>>> + const struct cpumask *mask, >>>>>>>>>> bool force) >>>>>>>>>> +{ >>>>>>>>>> + struct xgene_msi *msi = >>>>>>>>>> irq_data_get_irq_chip_data(irq_data); >>>>>>>>>> + unsigned int gic_irq; >>>>>>>>>> + int ret; >>>>>>>>>> + >>>>>>>>>> + gic_irq = msi->msi_virqs[irq_data->hwirq % >>>>>>>>>> msi->settings->nr_hw_irqs]; >>>>>>>>>> + ret = irq_set_affinity(gic_irq, mask); >>>>>>>>> >>>>>>>>> >>>>>>>>> Erm... This as the effect of moving *all* the MSIs hanging off >>>>>>>>> this interrupt to another CPU. I'm not sure that's an acceptable >>>>>>>>> effect... What if another MSI requires a different affinity? >>>>>>>> >>>>>>>> >>>>>>>> We have 16 'real' hardware IRQs. Each of these has multiple MSIs >>>>>>>> attached to it. >>>>>>>> So this will move all MSIs handing off this interrupt to another >>>>>>>> CPU; and we don't support different affinity settings for >>>>>>>> different MSIs that are attached to the same hardware IRQ. >>>>>>> >>>>>>> >>>>>>> Well, that's a significant departure from the expected behaviour. >>>>>>> In other words, I wonder how useful this is. Could you instead >>>>>>> reconfigure the MSI itself to hit the right CPU (assuming you don't >>>>>>> have more than 16 CPUs and if >>>>>>> that's only for XGene-1, this will only be 8 at most)? This would >>>>>>> reduce your number of possible MSIs, but at least the semantics of >>>>>>> set_afinity would be preserved. >>>>>> >>>>>> X-Gene-1 supports 2688 MSIs that are divided into 16 groups, each >>>>>> group has 168 MSIs that are mapped to 1 hardware GIC IRQ (so we have >>>>>> 16 hardware GIC IRQs for 2688 MSIs). >>>>> >>>>> We've already established that. >>>>> >>>>>> Setting affinity of single MSI to deliver it to a target CPU will move >>>>>> all the MSIs mapped to the same GIC IRQ to that CPU as well. This is >>>>>> not a standard behavior, but limiting the total number of MSIs will >>>>>> cause a lot of devices to fall back to INTx (with huge performance >>>>>> penalty) or even fail to load their driver as these devices request >>>>>> more than 16 MSIs during driver initialization. >>>>> >>>>> No, I think you got it wrong. If you have 168 MSIs per GIC IRQ, and >>>>> provided that you have 8 CPUs (XGene-1), you end up with 336 MSIs per >>>>> CPU (having statically assigned 2 IRQs per CPU). >>>>> >>>>> Assuming you adopt my scheme, you still have a grand total of 336 MSIs >>>>> that can be freely moved around without breaking any userspace >>>>> expectations. >>>>> >>>> Thanks Marc. This is a very good idea. >>>> >>>> But to move MSIs around, I need to change MSI termination address and data >>>> and write them to device configuration space. This may cause problems >>>> if the device >>>> fires an interrupt at the same time when I do the config write? >>>> >>>> What is your opinion here? >>> >>> There is an inherent race when changing the affinity of any interrupt, >>> whether that's an MSI or not. The kernel is perfectly prepared to handle >>> such a situation (to be honest, the kernel doesn't really care). >>> >>> The write to the config space shouldn't be a concern. You will either >>> hit the old *or* the new CPU, but that race is only during the write >>> itself (you can read back from the config space if you're really >>> paranoid). By the time you return from this read/write, the device will >>> be reconfigured. >>> >>>>> I think that 336 MSIs is a fair number (nowhere near the 16 you claim). >>>>> Most platforms are doing quite well with that kind of numbers. Also, >>>>> you don't have to allocate all the MSIs a device can possibly claim (up >>>>> to 2048 MSI-X per device), as they are all perfectly capable of using >>>>> less MSI without having to fallback to INTx). >>>>> >>>>>> I can document the limitation in affinity setting of X-Gene-1 MSI in >>>>>> the driver to hopefully not make people surprise and hope to keep the >>>>>> total number of supported MSI as 2688 so that we can support as many >>>>>> cards that require MSI/MSI-X as possible. >>>>> >>>>> I don't think this is a valid approach. This breaks userspace (think of >>>>> things like irqbalance), and breaks the SMP affinity model that Linux >>>>> uses. No amount of documentation is going to solve it, so I think you >>>>> just have to admit that the HW is mis-designed and do the best you can >>>>> to make it work like Linux expect it to work. >>>>> >>>>> The alternative would to disable affinity setting altogether instead of >>>>> introducing these horrible side effects. >>>>> >>>> I have it disabled (set_affinity does nothing) in my v4 patch. >>> >>> It would be good if you could try the above approach. It shouldn't be >>> hard to write, and it would be a lot better than just ignoring the problem. >>> >> Yes. I am working on this change. > > In which case, I'll wait for a v5. No need to spend time on something > that is going to change anyway. Marc: Is it possible to use the pci_msi_create_default_irq_domain for the X-Gene MSI driver. Or is this going to be removed in the future. Thanks > > Thanks, > > M. > -- > Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html