Yinghai, On Thu, 2 Jan 2014, Yinghai Lu wrote: > For ioapic hot-add support, it would be easy if we have continuous > irq numbers for hot added ioapic controller. I really don't care about easy. Easy to solve problems are for wimps. What you really want to say is, that ioapic hot-add support requires a contiguous irq number space for a hotplugged ioapic to avoid expensive translations in the ioapic hotplug code. That's a proper reason for making that change to the core code. > We can reserve irq range at first, and later allocate desc for those > pre-reserved irqs when they are needed. > > The reasons for not allocating them during reserving: > 1. only several pins of one ioapic are used, allocate for all pins, will > waste memory for not used pins. > 2. allocate later when is needed could make sure irq_desc is allocated > on local node ram, as dev->node is set at that point. > > -v2: update changelog by adding reasons, requested by Konrad. > -v3: according to tglx: > separate core code change with arch code change. Thanks for splitting the patches! Now the scope of this change becomes more obvious and what I already suspected before becomes crystal clear. The initial intention of irq_reserve_irqs() was to cope with the legacy interrupts to prevent the dynamic allocator from giving them out, but it was at least a misnomer if not even a misconception. Did you notice that? No! Did you even think why irq_reserve_irqs() exists? No! You just hacked it into submission for your purpose. As usual, sigh! What prevents a user of __irq_alloc_reserved_desc() to request something completely out of its range? Nothing as you happily return an existing interrupt via: + if (irq_to_desc(irq)) + return irq; which is true for all already existing interrupts. So some random off by one is going to cause a spurious and extremly hard to debug issue. Brilliant. No, we are not going to play the "it works for Yinghai" game again. I wasted enough time with that already. There is a clear step by step approach to get this done proper: 1) Get rid of the existing misconception/misnomer of irq_reserve_irqs(). Make it explicit that this is dealing with legacy irq spaces. It's not that hard as there are only two users in tree which are both trivial to fix. 2) Provide a proper reservation mechanism which does not piggypack blindly on the allocation bitmap. So what you want is a reservation which: A) Marks the irq range in the allocation bitmap This prevents other code pathes to stomp on that range. B) Stores a unique generated ID in a separate radix tree for that particular irq range. The generated ID is returned to the caller as it is required for actually allocating an interrupt from that range. We don't have to bother with making this conditional as the initial memory consumption of the radix tree is minimal and we only expand it when we actually use that hotplug feature. 3) Provide a proper alloc_reserved_irqdesc() function This function verifies against the reservation ID which was handed out by the reservation function. It's questionable whether we want to allow the reuse of already allocated irq descriptors. I'm leaning to avoid that. See #4 4) Provide a proper mechanism to free the registered irq descriptors and the reservation range when the physical device is removed from the system. So you don't have to preserve state in the ioapic code. Physical hotplug is not a high frequency hotpath operation. 5) Modify the x86 ioapic code to always use the reserve first and allocate later mechanism to avoid ifdeffery and pointless conditional code pathes. That also ensures proper test coverage. TBH, I could not be bothered to look at your x86 related changes, but I expect they are from the "make it work for Yinghai" departement as well. I'll review them once the core code changes are in an acceptable shape. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html