On Mon, 21 Jul 2003, Daniel Ritz wrote: > On Mon July 21 2003 21:44, Javier Achirica wrote: > > > > On Mon, 21 Jul 2003, Daniel Ritz wrote: > > > > > On Mon July 21 2003 13:00, Javier Achirica wrote: > > > > > > > > Daniel, > > > > > > > > Thank you for your patch. Some comments about it: > > > > > > > > - I'd rather fix whatever is broken in the current code than going back to > > > > spinlocks, as they increase latency and reduce concurrency. In any case, > > > > please check your code. I've seen a spinlock in the interrupt handler that > > > > may lock the system. > > > > > > but we need to protect from interrupts while accessing the card and waiting for > > > completion. semaphores don't protect you from that. spin_lock_irqsave does. the > > > spin_lock in the interrupt handler is there to protect from interrupts from > > > other processors in a SMP system (see Documentation/spinlocks.txt) and is btw. > > > a no-op on UP. and semaphores are quite heavy.... > > > > Not really. You can still read the received packets from the card (as > > you're not issuing any command and are using the other BAP) while a > > command is in progress. There are some specific cases in which you need > > to have protection, and that cases are avoided with the down_trylock. > > > > ok, i think i have to look closer...if the card can handle that then we don't need > to irq-protect all the areas i did protect...but i do think that those down_trylock and > then the schedule_work should be replaced by a simple spinlock_irq_save... > > i look closer at it tomorrow. > you happen to have the tech spec lying aroung? I have an old one, but I don't think that I'm allowed (by Cisco) to pass it around. > > AFAIK, interrupt serialization is assured by the interrupt handler, so you > > don't need to do that. > > > > > > - The fix for the transmit code you mention, is about fixing the returned > > > > value in case of error? If not, please explain it to me as I don't see any > > > > other changes. > > > > > > fixes: > > > - return values > > > - when to free the skb, when not > > > - disabling the queues > > > - netif_wake_queue called from the interrupt handler only (and on the right > > > net_device) > > > - i think the priv->xmit stuff and then the schedule_work is evil: > > > if you return 0 from the dev->hard_start_xmit then the network layer assumes > > > that the packet was kfree_skb()'ed (which does only frees the packet when the > > > refcount drops to zero.) this is the cause for the keventd killing, for sure! > > > > > > if you return 0 you already kfree_skb()'ed the packet. and that's it. > > > > This is where I have the biggest problems. As I've read in > > Documentation/networking/driver.txt, looks like the packet needs to be > > freed "soon", but doesn't require to be before returning 0 in > > hard_start_xmit. Did I get it wrong? > > > > no, i got it wrong. but still...it's the xmit where the oops comes from.... > > wait. isn't there a race in airo_do_xmit? at high xfer rates (when it oopses) the > queue can wake right after it is stopped in the down_trylock section. so you can > happen to loose an skb 'cos the write to priv->xmit is not protected at all and > there should be a check so that only one skb can be queue there. no? > (and then the irq-handler can wake the queue too) > > ok, i think i got it now. i'll do a new patch tomorrow or so that tries: > - to fix the transmit not to oops > - to avoid disabling the irq's whenever possible > - using spinlocks instead of the heavier semaphores ('cos i think if it's done cleaner > than i did it now, it's faster than the semas, and to make hch happy :) Yes! This is the race that may be causing problems. I'll try to fix it and we may compare semaphore vs. spinlock implementation. Javier Achirica - : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html