On Tue, 14 Feb 2017, Thomas Gleixner wrote: > On Mon, 13 Feb 2017, Linus Torvalds wrote: > > On Mon, Feb 13, 2017 at 1:35 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > > > > arch/x86/platform/goldfish/goldfish.c > > > > > > static int __init goldfish_init(void) > > > { > > > platform_device_register_simple("goldfish_pdev_bus", -1, > > > goldfish_pdev_bus_resources, 2); > > > return 0; > > > } > > > device_initcall(goldfish_init); > > > > > > So it unconditionallyt registers that platform device, which has IRQ 4 as > > > irq resource and the driver happily matches on the platform devices. > > > > > > Wonderful crap, isn't it? It should be made 'depend on BROKEN'. > > > > Ugh. Yeah, that's crazy. Random hardcoded interfaces that get enabled > > by people by mistake. > > > > And yeah, it's not just the irq. It just randomly sets up memory addresses too. > > > > That thing needs to be disabled some way. Maybe not marked "broken", > > but there needs to be something that actually enables it at runtime > > (like a kernel command line option or something like that). I prefer to mark it broken, because that's what it is, but we can add a command line option as well. See below. > > That said - that code was presumably enabled before too, so why would > > this break something? And if this is the cause, we need to figure out > > what it is that it then triggers.. > > I'm on it ... When that goldfish bus driver gets an interrupt then it really goes down the drain: static irqreturn_t goldfish_pdev_bus_interrupt(int irq, void *dev_id) { irqreturn_t ret = IRQ_NONE; while (1) { u32 op = readl(pdev_bus_base + PDEV_BUS_OP); So here it reads from that hardcoded physical address which was handed in via that platform device. switch (op) { case PDEV_BUS_OP_DONE: return IRQ_NONE; If 'op' is 0 it returns IRQ_NONE; case PDEV_BUS_OP_REMOVE_DEV: goldfish_pdev_remove(); break; case PDEV_BUS_OP_ADD_DEV: goldfish_new_pdev(); break; These two handle device add/remove which pokes more in that address range. Cute stuff with kmalloc(GFP_ATOMIC) and other nice things inside of a hard interrupt handler. } ret = IRQ_HANDLED; Sets ret to HANDLED, which is pointless because the only exit from this while(1) loop is via the PDEV_BUS_OP_DONE case, which returns IRQ_NONE ! } return ret; } On my machine this loop never breaks because op = 0xffffffff . Which obviously causes the rcu stalls and whatever. I haven't seen such an engineering trainwreck in a long time. So now, the real interesting question remains: Why is that ioapic - retrigger patch causing that nonsense to run? >From Linus debug patch we can't see which interrupt line is retriggered and it's a WARN_ONCE().... Gabriel, can you please apply the debug patch below instead of the one Linus sent, so we can get some more information about this. Can you upload /proc/interrupts as well? The last one I've seen does not have e1000 in it. Leave GOLDFISH and WBT disabled for now. Thanks, tglx 8<------------- --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -1098,9 +1098,10 @@ EXPORT_SYMBOL_GPL(irq_chip_set_type_pare int irq_chip_retrigger_hierarchy(struct irq_data *data) { for (data = data->parent_data; data; data = data->parent_data) - if (data->chip && data->chip->irq_retrigger) + if (data->chip && data->chip->irq_retrigger) { + pr_info("Retrigger %s %u\n", data->chip->name, data->irq); return data->chip->irq_retrigger(data); - + } return 0; } -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html