On Wed, May 31, 2017 at 04:18:44PM +0200, Mason wrote: > On 24/05/2017 12:22, Marc Zyngier wrote: > > On 24/05/17 11:00, Robin Murphy wrote: > >> On 23/05/17 20:15, Mason wrote: > >>> On 23/05/2017 20:03, Robin Murphy wrote: > >>>> On 23/05/17 18:54, Mason wrote: > >>>>> On 23/05/2017 19:03, Bjorn Helgaas wrote: > >>>>>> On Wed, May 17, 2017 at 04:56:08PM +0200, Marc Gonzalez wrote: > >>>>>>> On 20/04/2017 16:28, Marc Gonzalez wrote: > >>>>>>> > >>>>>>>> +static int tango_set_affinity(struct irq_data *data, > >>>>>>>> + const struct cpumask *mask, bool force) > >>>>>>>> +{ > >>>>>>>> + return -EINVAL; > >>>>>>>> +} > >>>>>>>> + > >>>>>>>> +static struct irq_chip tango_chip = { > >>>>>>>> + .irq_ack = tango_ack, > >>>>>>>> + .irq_mask = tango_mask, > >>>>>>>> + .irq_unmask = tango_unmask, > >>>>>>>> + .irq_set_affinity = tango_set_affinity, > >>>>>>>> + .irq_compose_msi_msg = tango_compose_msi_msg, > >>>>>>>> +}; > >>>>>>> > >>>>>>> Hmmm... I'm wondering why .irq_set_affinity is required. > >>>>>>> > >>>>>>> static int setup_affinity(struct irq_desc *desc, struct cpumask *mask) > >>>>>>> first calls __irq_can_set_affinity() to check whether > >>>>>>> desc->irq_data.chip->irq_set_affinity) exists. > >>>>>>> > >>>>>>> then calls irq_do_set_affinity(&desc->irq_data, mask, false); > >>>>>>> which calls chip->irq_set_affinity(data, mask, force); > >>>>>>> = msi_domain_set_affinity() > >>>>>>> which calls parent->chip->irq_set_affinity() unconditionally. > >>>>>>> > >>>>>>> Would it make sense to test that the callback is implemented > >>>>>>> before calling it? > >>>>>>> > >>>>>>> > >>>>>>> [ 0.723895] Unable to handle kernel NULL pointer dereference at virtual address 00000000 > >>>>>> > >>>>>> I'm not sure what you're asking. > >>>>>> > >>>>>> Is this a bug report for the v4 tango driver? > >>>>> > >>>>> No. > >>>>> > >>>>>> Or are you asking whether msi_domain_set_affinity() should be changed > >>>>>> to check whether parent->chip->irq_set_affinity is implemented? > >>>>> > >>>>> Yes. The way things are implemented now, drivers are forced > >>>>> to define an irq_set_affinity callback, even if it just returns > >>>>> an error, otherwise, the kernel crashes, because of the > >>>>> unconditional function pointer deref. > >>>>> > >>>>>> msi_domain_set_affinity() has called parent->chip->irq_set_affinity() > >>>>>> without checking since it was added in 2014 by f3cf8bb0d6c3 ("genirq: Add > >>>>>> generic msi irq domain support"), so if there's a problem here, it's most > >>>>>> likely in the tango code. > >>>>> > >>>>> The issue is having to define an "empty" function. > >>>>> (Unnecessary code bloat and maintenance.) > >>>> > >>>> AFAICS, only one driver (other than this one) implements a "do nothing" > >>>> set_affinity callback - everyone else who doesn't do anything hardware > >>>> specific just passes it along via irq_chip_set_affinity_parent(). > >>> > >>> I counted 4. Where did I mess up? > >>> > >>> advk_msi_set_affinity > >>> altera_msi_set_affinity > >>> nwl_msi_set_affinity > >>> vmd_irq_set_affinity > >>> tango_set_affinity > >> > >> Fair point - I went through drivers/irqchip and the various > >> arch-specific instances and found ls_scfg_msi_set_affinity(), but > >> somehow skipped over drivers/pci. Anyway, I think the question stands of > >> why are these handful of drivers *not* using irq_chip_set_affinity_parent()? > > > > Probably because they don't have a parent, in the hierarchical sense. > > All they have is a chained interrupt (*puke*). Which implies that > > changing the affinity of one MSI would affect all of them, completely > > confusing unsuspecting userspace such as irqbalance. > > > >> As an outsider, it naively seems logical that the affinity of an MSI > >> which just gets translated to a wired interrupt should propagate through > >> to the affinity of that wired interrupt, but maybe there are reasons not > >> to; I really don't know. > > > > See above. The main issue is that HW folks haven't understood the actual > > use of MSIs, and persist in implementing them as an afterthought, based > > on some cascading interrupt controller design. > > > > Sad. So sad. > > For the record, below is the patch I had in mind: > > diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c > index 8a3e872798f3..edfc95575a37 100644 > --- a/kernel/irq/msi.c > +++ b/kernel/irq/msi.c > @@ -91,9 +91,11 @@ int msi_domain_set_affinity(struct irq_data *irq_data, > { > struct irq_data *parent = irq_data->parent_data; > struct msi_msg msg; > - int ret; > + int ret = -EINVAL; > + > + if (parent->chip->irq_set_affinity) > + ret = parent->chip->irq_set_affinity(parent, mask, force); > > - ret = parent->chip->irq_set_affinity(parent, mask, force); > if (ret >= 0 && ret != IRQ_SET_MASK_OK_DONE) { > BUG_ON(irq_chip_compose_msi_msg(irq_data, &msg)); > irq_chip_write_msi_msg(irq_data, &msg); > > > > Then, it would be safe to remove "do-nothing" .irq_set_affinity > callbacks in drivers/pci/host I assume it's obvious that nobody else is going to take this and run with it, so nothing will happen with this unless you finish it up by doing the cleanup (removing the "do-nothing" callbacks) and adding a changelog and signed-off-by. This would be more an IRQ patch than a PCI patch, but if I were reviewing it, I would look for assurance that *all* the no-op .irq_set_affinity callbacks were cleaned up, not just those in drivers/pci/host. Bjorn