On Wed, 26 Aug 2020 12:16:32 +0100, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > The documentation of irq_chip_compose_msi_msg() claims that with > hierarchical irq domains the first chip in the hierarchy which has an > irq_compose_msi_msg() callback is chosen. But the code just keeps > iterating after it finds a chip with a compose callback. > > The x86 HPET MSI implementation relies on that behaviour, but that does not > make it more correct. > > The message should always be composed at the domain which manages the > underlying resource (e.g. APIC or remap table) because that domain knows > about the required layout of the message. > > On X86 the following hierarchies exist: > > 1) vector -------- PCI/MSI > 2) vector -- IR -- PCI/MSI > > The vector domain has a different message format than the IR (remapping) > domain. So obviously the PCI/MSI domain can't compose the message without > having knowledge about the parent domain, which is exactly the opposite of > what hierarchical domains want to achieve. > > X86 actually has two different PCI/MSI chips where #1 has a compose > callback and #2 does not. #2 delegates the composition to the remap domain > where it belongs, but #1 does it at the PCI/MSI level. > > For the upcoming device MSI support it's necessary to change this and just > let the first domain which can compose the message take care of it. That > way the top level chip does not have to worry about it and the device MSI > code does not need special knowledge about topologies. It just sets the > compose callback to NULL and lets the hierarchy pick the first chip which > has one. > > Due to that the attempt to move the compose callback from the direct > delivery PCI/MSI domain to the vector domain made the system fail to boot > with interrupt remapping enabled because in the remapping case > irq_chip_compose_msi_msg() keeps iterating and choses the compose callback > of the vector domain which obviously creates the wrong format for the remap > table. > > Break out of the loop when the first irq chip with a compose callback is > found and fixup the HPET code temporarily. That workaround will be removed > once the direct delivery compose callback is moved to the place where it > belongs in the vector domain. > > Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > --- > V2: New patch. Note, that this might break other stuff which relies on the > current behaviour, but the hierarchy composition of DT based chips is > really hard to follow. Grepping around, I don't think there is any occurrence of two irqchips providing irq_compose_msi() that can share a hierarchy on any real system, so we should be fine. Famous last words. > --- > arch/x86/kernel/apic/msi.c | 7 +++++-- > kernel/irq/chip.c | 12 +++++++++--- > 2 files changed, 14 insertions(+), 5 deletions(-) > > --- a/arch/x86/kernel/apic/msi.c > +++ b/arch/x86/kernel/apic/msi.c > @@ -479,10 +479,13 @@ struct irq_domain *hpet_create_irq_domai > info.type = X86_IRQ_ALLOC_TYPE_HPET; > info.hpet_id = hpet_id; > parent = irq_remapping_get_ir_irq_domain(&info); > - if (parent == NULL) > + if (parent == NULL) { > parent = x86_vector_domain; > - else > + } else { > hpet_msi_controller.name = "IR-HPET-MSI"; > + /* Temporary fix: Will go away */ > + hpet_msi_controller.irq_compose_msi_msg = NULL; > + } > > fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name, > hpet_id); > --- a/kernel/irq/chip.c > +++ b/kernel/irq/chip.c > @@ -1544,10 +1544,16 @@ int irq_chip_compose_msi_msg(struct irq_ > struct irq_data *pos = NULL; > > #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY > - for (; data; data = data->parent_data) > -#endif > - if (data->chip && data->chip->irq_compose_msi_msg) > + for (; data; data = data->parent_data) { > + if (data->chip && data->chip->irq_compose_msi_msg) { > pos = data; > + break; > + } > + } > +#else > + if (data->chip && data->chip->irq_compose_msi_msg) > + pos = data; > +#endif > if (!pos) > return -ENOSYS; > > > Is it just me, or is this last change more complex than it ought to be? diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index 857f5f4c8098..25e18b73699c 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -1544,7 +1544,7 @@ int irq_chip_compose_msi_msg(struct irq_data *data, struct msi_msg *msg) struct irq_data *pos = NULL; #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY - for (; data; data = data->parent_data) + for (; data && !pos; data = data->parent_data) #endif if (data->chip && data->chip->irq_compose_msi_msg) pos = data; Though the for loop in a #ifdef in admittedly an acquired taste... Reviewed-by: Marc Zyngier <maz@xxxxxxxxxx> M. -- Without deviation from the norm, progress is not possible.