On Tue, Nov 03, 2015 at 04:10:23PM -0600, Jonathan David wrote: > On 11/03/2015 01:42 PM, Henrik Austad wrote: > >On Tue, Nov 03, 2015 at 11:43:21AM -0600, Jonathan David wrote: > >>On 10/22/2015 12:59 AM, Henrik Austad wrote: > > >>>>Adding a delay after long series of writes gives them time to > >>>>complete, and for higher priority tasks to run unimpeded. > >>> > >>>Aren't we running with threaded interrupts? > >>> > >>>What happens to the thread(s) pushing data to the network? > >>>What about xmit-buffer once it is full? Which thread will block on send or > >>>have its sk_buff dropped? > >> > >>All of this is totally irrelevant to the problem we are seeing. > > > >If this is irrelevant, why hack at the network-driver, hmm? > > It is relevant to the network driver, as this is where the symptoms were > discovered; however, it has no relation to the packet delivery path. This is > related purely to link configuration. I was under the impression that a PCI link configuration/training was down to speed etc, not how many MMIO read/writes it could do. Then again, a lot of this stuff is pure (black) magic. > >>The e1000x driver itself is not responsible for the delay here. > > > >... then why hack the network-driver? > > Lack of better known options. > > >>The issue is with PCI where issuing a large number of MMIO writes > >>followed by a read (to force said writes to execute) will stall the CPU. > >>When the CPU is stalled, no interrupts are serviced, including the local > >>apic timer interrupt, which was responsible for waking up cyclictest. > >>This behavior was observed within traces gathered from cyclictest with > >>ftrace enabled. > > > >So you get bogged down with interrupts disabled; > > No, interrupts are entirely enabled while the PCI MMIO writes/read are > issued; but the local apic timer still arrives late, presumably because the > CPU is waiting to complete whatever writes remain in the buffer. Heh, strange, is the interrupt signal itself delivered late as well, or just the handling of it? > I think this might be the root of our miscommunication. You are asking good > questions about threaded interrupts, etc, but it isn't clear how they are > related to the specific problem we are seeing. Perhaps a trace of the problem could be shared? A full function-trace with irq-events and timer-events would be appreciated :) -- Henrik Austad
Attachment:
signature.asc
Description: Digital signature