Re: Linux mask_msi_irq() question

Kanoj Sarcar <kanojsarcar@xxxxxxxxx> · Wed, 25 Aug 2010 00:40:00 -0700 (PDT)

--- On Tue, 8/24/10, Grant Grundler <grundler@xxxxxxxxxxxxxxxx> wrote:

> From: Grant Grundler <grundler@xxxxxxxxxxxxxxxx>
> Subject: Re: Linux mask_msi_irq() question
> To: "Kanoj Sarcar" <kanojsarcar@xxxxxxxxx>
> Cc: "Jesse Barnes" <jbarnes@xxxxxxxxxxxxxxxx>, linux-pci@xxxxxxxxxxxxxxx
> Date: Tuesday, August 24, 2010, 10:00 PM
> On Mon, Aug 23, 2010 at 08:59:25PM
> -0700, Kanoj Sarcar wrote:
> ...
> > I did go over the specs some, two points of
> reference:
> > 
> > a. PCIE base spec rev 3.0 version .71 released May 25,
> 2010:
> > section 2.4.1 table D2a ensures posted write (such as
> msix
> > write issued by device) does not pass read completion
> issued
> > by device (such as read-completion for MSIX entry
> mask).
> > 
> > b. MSIX ECN comments (section 6.8): "An MSI-X vector
> is
> > masked when its associated MSI-X Table entry Mask bit
> or the
> > MSI-X Function Mask bit is set. While a vector is
> masked,
> > the function is prohibited from sending the associated
> message,
> > and the function must set the associated Pending bit
> whenever
> > the function would otherwise send the message."
> > 
> > Given these two, roughly the host action of "write
> entry mask";
> > "read entry mask" _should_ apparently provide a
> interrupt barrier.
> 
> Yes, it will make sure one can mask MSI and not drop the
> MSI signal.
> I don't believe it provides any sort of barrier. In flight
> MSIs
> still need to be dealt with on the host side.

Agreed about in flight msi's: chipset/platform specific code
needs to synchronize across multiple cpu's if there is a
strict guarantee requirement.

But is it acceptable to have the device send out an msix
after having responded back to the host's entry mask read?

> 
> > 
> > But if you wanted to play the devil's advocate, in
> the
> > comment b. above, "While a vector is masked" is not
> clearly
> > defined; IE if the host does a pio write to mask,
> then
> > reads back the mask (which is to a certain extent
> orthogonal
> > to device noticing and acting on the mask change),
> does MSIX
> > spec actually require the device to provide an
> interrupt
> > barrier?
> 
> Is the mask exposed via IO Port space? I don't think so.
> So I don't think there is a 'PIO vs MMIO' race possible
> here.

No, the mask is not in io port space.

> 
> > Like you mention, there are already chipset issues
> > anyway.
> > 
> > Also, what happens if the flushing read is deleted?
> Since
> > this is being invoked on every MSIX reception on host
> (at
> > least on x86[_64]), it is rather a costly operation.
> IE, if
> > the read is removed, and an interrupt does creep in,
> what are
> > the problems? Kernel panic, NMI etc? IE does some
> piece of
> > kernel code (irq rebalance etc) actually rely on the
> barrier?
> 
> Kernel could panic if the interrupt handler is invoked when
> it
> shouldn't be.  e.g. data structures have been
> deallocated and
> pointers to those structures set to NULL. I'm doubtful this
> would
> cause an NMI/MCA unless some DMA or MMIO state was changed
> as a
> result of the interrupt (which is actually more likely
> than
> I originally thought).

In general, I agree. I am under the assumption though that
Linux handles misbehaving devices that generate spurious 
interrupts by turning them off after a while. So, I think
each port is taking care of handling such interrupts by
potentially maintaining state whether the vector should
trigger or not, which is checked before invoking driver isr.

> 
> I suspect any code that mucks with interrupt state would
> also depend
> on the interrupt not triggering at that moment.
> 

At least from the part that I understand, I agree to the above.
When x86 receives an msix, it masks the entry and reads the
mask back on the interrupt cpu. The interrupt is not
triggering at this point. Which begs the question: why readback
at this point?

The part that I don't understand is how intercpu synchronization
is achieved in irq rebalancing case, device deinit case, etc. 
Does Linux currently take a strict approach to barriers in 
these cases, or is it a loose approach where a laggard instance
of the interrupt can still creep to an unintended cpu? In the
loose approach case, the readback does not make much sense to
me.

Kanoj

> cheers,
> grant
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html