Hi, Very below is my patch proposal with a comment, which in my opinion is precious enough to save it for future help in reading and understanding the code. I hope Alan will not blame me I've not asked for his permission before sending, and he would ack this patch as it is or at least most of this. Thanks & regards, Jarek P. On Wed, Jul 25, 2007 at 03:46:56PM +0100, Alan Cox wrote: > > > The code in question lib8390.c does > > > > > > disable_irq(); > > > fiddle_with_the_network_card_hardware() > > > enable_irq(); > > ... > > > > > > No idea how this affects the network card, as the code there must be > > > able to handle interrupts, which are not originated from the card due to > > > interrupt sharing. > > > > I think, in this last yesterday's patch Ingo could be right, yet! > > The comment at the beginnig points this is done like that because > > of chip's slowness. And problems with timing are mysterious. > > > > On the other hand author of this code didn't use spin_lock_irqsave > > for some reason, probably after testing this option too. So, I hope > > this is the right path, but alas, I'm not sure this patch has to > > prove this 100%. > > The author (me) didn't use spin_lock_irqsave because the slowness of the > card means that approach caused horrible problems like losing serial data > at 38400 baud on some chips. Rememeber many 8390 nics on PCI were ISA > chips with FPGA front ends. > > > Anyway, in my opinion this situation where interrupts could/have_to > > be used for such strange things should confirm the need of more > > options for handling irqs individually. > > Ok the logic behind the 8390 is very simple: > > Things to know > - IRQ delivery is asynchronous to the PCI bus > - Blocking the local CPU IRQ via spin locks was too slow > - The chip has register windows needing locking work > > So the path was once (I say once as people appear to have changed it > in the mean time and it now looks rather bogus if the changes to use > disable_irq_nosync_irqsave are disabling the local IRQ) > > > Take the page lock > Mask the IRQ on chip > Disable the IRQ (but not mask locally- someone seems to have > broken this with the lock validator stuff) > [This must be _nosync as the page lock may otherwise > deadlock us] > Drop the page lock and turn IRQs back on > > At this point an existing IRQ may still be running but we can't > get a new one > > Take the lock (so we know the IRQ has terminated) but don't mask > the IRQs on the processor > Set irqlock [for debug] > > Transmit (slow as ****) > > re-enable the IRQ > > > We have to use disable_irq because otherwise you will get delayed > interrupts on the APIC bus deadlocking the transmit path. > > Quite hairy but the chip simply wasn't designed for SMP and you can't > even ACK an interrupt without risking corrupting other parallel > activities on the chip. > > Alan > ------> From: Jarek Poplawski <jarkao2@xxxxx> Subject: lib8390: comment on locking by Alan Cox Additional explanation of problems with locking by Alan Cox. Signed-off-by: Jarek Poplawski <jarkao2@xxxxx> Cc: Alan Cox <alan@xxxxxxxxxxxxxxxxxxx> Cc: Paul Gortmaker <p_gortmaker@xxxxxxxxx> Cc: Jeff Garzik <jeff@xxxxxxxxxx> --- diff -Nurp 2.6.23-rc1-/drivers/net/lib8390.c 2.6.23-rc1/drivers/net/lib8390.c --- 2.6.23-rc1-/drivers/net/lib8390.c 2007-07-09 01:32:17.000000000 +0200 +++ 2.6.23-rc1/drivers/net/lib8390.c 2007-07-26 13:55:17.000000000 +0200 @@ -143,6 +143,52 @@ static void __NS8390_init(struct net_dev * annoying the transmit function is called bh atomic. That places * restrictions on the user context callers as disable_irq won't save * them. + * + * Additional explanation of problems with locking by Alan Cox: + * + * "The author (me) didn't use spin_lock_irqsave because the slowness of the + * card means that approach caused horrible problems like losing serial data + * at 38400 baud on some chips. Rememeber many 8390 nics on PCI were ISA + * chips with FPGA front ends. + * + * Ok the logic behind the 8390 is very simple: + * + * Things to know + * - IRQ delivery is asynchronous to the PCI bus + * - Blocking the local CPU IRQ via spin locks was too slow + * - The chip has register windows needing locking work + * + * So the path was once (I say once as people appear to have changed it + * in the mean time and it now looks rather bogus if the changes to use + * disable_irq_nosync_irqsave are disabling the local IRQ) + * + * + * Take the page lock + * Mask the IRQ on chip + * Disable the IRQ (but not mask locally- someone seems to have + * broken this with the lock validator stuff) + * [This must be _nosync as the page lock may otherwise + * deadlock us] + * Drop the page lock and turn IRQs back on + * + * At this point an existing IRQ may still be running but we can't + * get a new one + * + * Take the lock (so we know the IRQ has terminated) but don't mask + * the IRQs on the processor + * Set irqlock [for debug] + * + * Transmit (slow as ****) + * + * re-enable the IRQ + * + * + * We have to use disable_irq because otherwise you will get delayed + * interrupts on the APIC bus deadlocking the transmit path. + * + * Quite hairy but the chip simply wasn't designed for SMP and you can't + * even ACK an interrupt without risking corrupting other parallel + * activities on the chip." [lkml, 25 Jul 2007] */ - To unsubscribe from this list: send the line "unsubscribe linux-net" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html