Hi Thomas, Thanks for your comments. Please refer to inline comments below. On 2014/5/28 3:58, Thomas Gleixner wrote: > Jiang, > > On Tue, 27 May 2014, Jiang Liu wrote: > >> +static int alloc_irq_from_domain(struct irq_domain *domain, u32 gsi, int pin) >> { >> + int irq = -1; >> + >> + if (gsi >= arch_dynirq_lower_bound(0)) { >> + irq = irq_create_mapping(domain, pin); >> + } else if (gsi < NR_IRQS_LEGACY) { >> + if (!ioapic_identity_map) >> + irq = irq_create_mapping(domain, pin); >> + else if (irq_domain_associate(domain, gsi, pin) == 0) >> + irq = gsi; >> + } else if (irq_create_strict_mappings(domain, gsi, pin, 1) == 0) { >> + irq = gsi; >> + } > > So you have these cases covered here: > > 1) The ACPI case of secondary ioapics. You only have the strict 1:1 > mapping for the first ioapic > > 2) The gsi < NR_IRQS_LEGACY case where you have two options: > > a) Let the core create a random virq number if ioapic_identity_map > is 0 > > ioapic_identity_map is only set by SFI and devicetree > > So in all other cases we fall into that code path for all > legacy interrupts. So how is that supposed to work lets say for > i8042 which has hardcoded irq 1 and 12? > > irq_create_mapping(1) > > hint = 1 % nr_irqs; --> 1 > virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node)); > > This returns something >= 16, because the irq descriptors > for 0-15 (LEGACY) are allocated already. > > The pin association works, but how is the i8042 driver supposed > to figure out that it should request the virq >=16 which was > created instead of the hardcoded 1 ? This is used to work around special non-ISA interrupts with GSI below NR_IRQS_LEGACY. The original code for the special case is: /* * Provide an identity mapping of gsi == irq except on truly * weird platforms that have non isa irqs in the first 16 gsis. */ return gsi >= NR_IRQS_LEGACY ? gsi : gsi_top + gsi; We have one path to handle ISA IRQs before calling alloc_irq_from_domain() as below: if (idx >= 0 && test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci)) return mp_irqs[idx].srcbusirq; > > b) Associate the gsi and the pin > > This only works because the virqs are already allocated at boot > time unconditionally due to arch_probe_nr_irqs() returning > NR_IRQS_LEGACY. So irq_domain_associate() works. > Undocumented works by chance behaviour. Yes. It's a good suggestion to enhance legacy_pic to make this code more clear. > > 3) The case where gsi < arch_dynirq_lower_bound() > > You create a strict mapping here, fine. > > This is confusing at best. > > First of all, we should use legacy_pic->nr_legacy_irqs instead of > NR_IRQS_LEGACY all over the place. > > mshyperv, ce4100 and intel-mid use the null_legacy_pic which has > nr_legacy_irqs = 0 and everything else uses the real pic which has > nr_legacy_irqs = NR_IRQS_LEGACY. So why do we even bother to allocate > and deal with NR_IRQS_LEGACY in the cases where we have no legacy? I'm not sure whether it works with ce4100, so used NR_IRQS_LEGACY instead of legacy_pic->nr_legacy_irqs for safety. Will try to refine it in next version. > > ce4100 is an oddball though. The ioapic is registered way before the > interrupt subsystem is initialized and I have a hard time to > understand that comment: > > /* We can't set this earlier, because we need to calibrate the timer */ > legacy_pic = &null_legacy_pic; I haven't figured out the story behind the comment yet:( > > The timer calibration happens after the interrupts are set up. I > assume it's check_timer() which wants that, but we know exactly how > the ce4100 works, so we might be able to avoid that whole "testing" > stuff. Sebastian, any input on this? > > If it turns out that ce4100 needs the inital real legacy pic for some > magic reason we still can be clever by extending the legacy pic data > structure to tell us about that change, i.e. instead of using > legacy_pic->nr_legacy_irqs having a field "nr_allocated_irqs", which > is set to NR_IRQS_LEGACY for the real pic and to 0 for the null_pic > and let ce4100 set that field to NR_IRQS_LEGACY before switching the > legacy_pic over to the null implementation. Good suggestion, will try this way. > But what's really disgusting is the magic ioapic_identity_map and the > extra ACPI specific ioapic_dynirq_base hackery. > > Why do we need strict mappings in the non ACPI case for all ioapic > pins? What's so different about ACPI? Or is this just to avoid > breaking the existing SFI/devicetree stuff. If that's the reason I'm > fine with it, but ... It's to avoid breaking SFI/intel_mid stuff. intel_mid assumes IRQ number equals to pin number and use pci_dev->irq to save both IRQ number and pin number. > > independent of this question we want to be more clever about the > handling of strict, legacy and freely associated linux irq numbers. > > So you have this weird ioapic_create_domain_fn callback in > mp_register_ioapic, which is solely there so the different callers can > hand in their domain ops and eventually set the magic > ioapic_identity_map flag. > > +void __init mp_register_ioapic(int id, u32 address, u32 gsi_base, > + ioapic_create_domain_fn cb, void *arg) > > What about having > > struct ioapic_domain { > struct irqdomain *domain; > const struct irq_domain_ops *ops; > void *arg; > enum domain_type type; > }; > > and add this struct to the ioapic struct. type is: > > enum domain_type { > IOAPIC_STRICT, > IOAPIC_LEGACY, > IOAPIC_DYNAMIC, > }; > > > Now the register function changes to: > > void mp_register_ioapic(int id, u32 address, u32 gsi_base, > const struct irq_domain_ops *ops, > int type) > { > ... > > ioapics[idx].irqdomain.ops = ops; > ioapics[idx].irqdomain.arg = arg; > ioapics[idx].irqdomain.type = type; > ioapics[idx].irqdomain.domain = NULL; > > ... > } > > and you can use mp_irqdomain_create() unconditionally for creating all > domains. And there you do: > > static int dynirq_lower_bound; > > mp_irqdomain_create() > { > ioapic->irqdomain.domain = irq_domain_add_linear(...); > > switch (ioapic->irqdomain.type) { > case IOAPIC_LEGACY: > /* > * Associate the legacy interrupts which have been > * already allocated. > */ > for (i = 0; i < legacy_pic->nr_allocated_irqs; i++) > irq_domain_associate(domain, i, i); > > case IOAPIC_STRICT: > dynirq_lower_bound += ioapic->nr_gsis; > > case IOAPIC_DYNAMIC: > break; > } > } > > So arch_dynirq_lower_bound() gets simplified to: > { > return dynirq_lower_bound; > } > > And alloc_irq_from_domain() becomes: > > int alloc_irq_from_domain(struct ioapic_domain *domain, int gsi, int pin) > { > switch (domain->type) { > case IOAPIC_DYNAMIC: > return irq_create_mapping(domain->domain, pin); > case IOAPIC_LEGACY: > case IOAPIC_STRICT: > return irq_create_strict_mappings(domain, gsi, pin, 1); > } > } > > At the call site of alloc_irq_from_domain() you have already: > > irq = irq_find_mapping(domain, pin); > if (irq <=0 ....) > alloc_irq_from_domain(domain, gsi, pin); > > So because we associated the legacy_pic->nr_allocated_irqs in > mp_irqdomain_create() already, you'll never call into > alloc_irq_from_domain() for those and the remaining ones for that > first ioapic are handled by the IOAPIC_STRICT fall through. > > For simplicity you can let SFI and devicetree register the ioapics > with their specific domain ops plus IOAPIC_LEGACY for the first ioapic > and IOAPIC_STRICT for all others. That also covers the case where the > null_legacy_pic with legacy_pic->nr_allocated_irqs == 0 is used. > > In the ACPI case you register with the acpi domain ops and > IOAPIC_LEGACY for the first and IOAPIC_DYNAMIC for the extra ioapics. > > Should be way cleaner and understandable, at least to me :) Really good suggestions! Make thing much more clear. > > Now there is that last oddity which bugs me in mp_map_pin_to_irq() > > /* > * Don't use irqdomain to manage ISA IRQs because there may be > * multiple IOAPIC pins sharing the same ISA IRQ number and > * irqdomain only supports 1:1 mapping between IOAPIC pin and > * IRQ number. > */ > if (idx >= 0 && test_bit(mp_irqs[idx].srcbus, mp_bus_not_pci)) { > irq = mp_irqs[idx].srcbusirq; > if ((flags & IOAPIC_MAP_ALLOC) && info->count == 0 && > mp_irqdomain_map(domain, irq, pin) != 0) > irq = -1; > > That really looks like a hack. I'm aware that the current irqdomain > code cannot deal with that oddball case. > > So what you are saying is that there are devices which have a separate > physical wire to different ioapic pins, but the ioapic is supposed to > bundle them to a shared interrupt. > > I agree that this is odd enough to handle at the ioapic level, but > it'd be nice to have a more elaborative comment on this. Will try to improve the comment. > > Aside of the above I'm pretty happy about the progress of this patch > set. One thing, which needs to be looked at are the usage sites of > irq_data->irq, whether they are safe. I did not spot any unsafe ones, > but a few functions which are called with irq_data->irq and make no > use of it. Sure, will check usages of irq_data->irq. Really appreciate you suggestions, it will improve the code a lot. Thanks! Gerry > > Thanks, > > tglx > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html