Hi Humanshu, Please set your email line break as 72 or 80, otherwise it is quite hard to reply in line. On Fri, Dec 21, 2018 at 10:48:10PM +0000, Himanshu Madhani wrote: > Hi Christoph, > > We are facing an issue with masked MSIX vectors received while trying to get pci vectors when BLK/SCSI-MQ is enabled when number of CPUs are lesser than the available MSIX vectors. For our ISP25xx chipset, hardware supports 32 MSIX vectors with MQ enabled. We originally found this issue on system using RH8.0 kernel which is at 4.19 version. The system that failed has 12 CPUs and maximum MSIX vectors requested were 32. pci_alloc_irq_affinity() returns 32 on qla2xxx driver, which is expected behaviour, and nr_possible_cpus is 32 on your system actually too. > We observed with new pci_alloc_irq_affinity() callback driver is returning 32 vectors when system has only 12 CPUs. As far as we understand, this call should have returned maximum 14 MSIX vectors (12 for CPU affinity + 2 reserved in .pre_vectors of irq_affinity structure). Also, we see that vectors returned include masked ones. Since driver received 32 vectors, We create 30 qpairs (2 less for reserved). In this scenario, we observed that on some qpairs, driver is not able to process interrupt because CPUs are masked at the PCI layer. Looking at the code, we noticed that ‘pre/post’ vectors sets in struct irq_affinity don’t appear to help here. > Yes, the .pre_vectors is 2, that means 30 PCI_IRQ_AFFINITY IO vectors is returned, which is still correct behaviour, it isn't 32 because the system may run out of irq vectors. Especially in this case, you only have 12 online CPUs, and it is enough for 1 vector to handle IO from one CPU. > From below call we should get only online_cpus() + reserved number of vectors back while requesting number of vectors, instead we get back numbers that driver requested. > No, that isn't correct, in theory it is fine for pci_alloc_irq_vectors_affinity() to return any number of irq vectors, which depends on available irq vectors. Especially it is workable to return reserved vecotor(.pre_vectors plus .post_vectors) and >= 1 PCI_IRQ_AFFINITY IO vector. > int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, > unsigned int max_vecs, unsigned int flags, > const struct irq_affinity *affd) > { > if (flags & PCI_IRQ_MSIX) { > vecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, > affd); > if (vecs > 0) > return vecs; > } > } > > static int __pci_enable_msix_range(struct pci_dev *dev, > struct msix_entry *entries, int minvec, > int maxvec, const struct irq_affinity *affd) > { > for (;;) { > if (affd) { > nvec = irq_calc_affinity_vectors(minvec, nvec, affd); > if (nvec < minvec) > return -ENOSPC; > } > } > > Which in-turn calls irq_calc_affinity_vectors(), Which should return min of num_online_cpus() + resv > > /** > * irq_calc_affinity_vectors - Calculate the optimal number of vectors > * @minvec: The minimum number of vectors available > * @maxvec: The maximum number of vectors available > * @affd: Description of the affinity requirements > */ > int irq_calc_affinity_vectors(int minvec, int maxvec, const struct irq_affinity *affd) > { > int resv = affd->pre_vectors + affd->post_vectors; > int vecs = maxvec - resv; > int ret; > if (resv > minvec) > return 0; > get_online_cpus(); > ret = min_t(int, cpumask_weight(cpu_possible_mask), vecs) + resv; > put_online_cpus(); > return ret; > } > > We do see the same using 4.20.0-rc6 kernel. See below table, we experimented by forcing maxcpu parameter to expose lower number of CPUs than vectors requested. > > Upstream - 4.20-rc6 > > MaxCPU= Cores Result > MQ Enabled ISP25xx Unset 48 Pass > MQ Enabled ISP25xx 2 24 Failed > MQ Enabled ISP25xx 4 30 Failed > MQ Enabled ISP27xx Unset 48 Pass > MQ Enabled ISP27xx 2 24 Failed > MQ Enabled ISP27xx 4 30 Failed > > Note that RH8.0 kernel which has the code from 4.19 kernel behaves the same way. We have not be able to do extensive testing with SLES. > We want to make sure we are reading this code right and our understanding is right. If not, please advise the right expectations and what changes are needed to address this. > > In case our understanding is right, whether we have any known issue in this area in 4.19 kernel which got addressed in 4.20-rc6 kernel. If yes, can you please point us to the commit message. If not, what additional data is needed to debug this further. We have captured PCIe trace and ruled out any issues at hardware/firmware level and we also see that the MSIX vector associate with the queue pair where we are not getting interrupts is masked. > > We want to understand how to calculate IRQ vectors that driver can request in such scenario. The irq vector allocation isn't wrong, and your IO hang is probably caused by not using the correct msix vector (qpair/hardware queue). For example, there are 30 IO vectors returned, and the mapping between CPU and IO vector may be something like below, you have to double check if the correct msix vector is used. CPU 0 ~ 11 is online, so only irq 45~55 & 57 should be used, you can see which CPU is originated from for each request via rq->mq_ctx->cpu, and the mapping is done via blk-mq automatically, especially blk_mq_unique_tag_to_hwq(tag) may tell you which hardware queue is mapped, and you can figured out which msix vector should be used for this hardware queue. irq 45, cpu list 0 irq 46, cpu list 1 irq 47, cpu list 2 irq 48, cpu list 3 irq 49, cpu list 4 irq 50, cpu list 5 irq 51, cpu list 6 irq 52, cpu list 7 irq 53, cpu list 8 irq 54, cpu list 9 irq 55, cpu list 10 irq 57, cpu list 11 irq 58, cpu list 12-13 irq 59, cpu list 14-15 irq 60, cpu list 16 irq 61, cpu list 17 irq 62, cpu list 18 irq 63, cpu list 19 irq 64, cpu list 20 irq 65, cpu list 21 irq 66, cpu list 22 irq 67, cpu list 23 irq 68, cpu list 24 irq 69, cpu list 25 irq 70, cpu list 26 irq 71, cpu list 27 irq 72, cpu list 28 irq 73, cpu list 29 irq 74, cpu list 30 irq 75, cpu list 31 thanks, Ming