Re: PCI: MSI interrupts masked using prohibited method

David Vrabel <david.vrabel@xxxxxxx> · Fri, 25 Jul 2008 17:37:49 +0100

Michal Schmidt wrote:
On Fri, 25 Jul 2008 07:42:52 -0600
Matthew Wilcox <matthew@xxxxxx> wrote:

On Fri, Jul 25, 2008 at 03:29:18PM +0200, Michal Schmidt wrote:
The interesting thing is that I can see Destination ID bits of MSI
Message Address change correctly in lspci output. But the interrupt
is still delivered load-balanced to all CPUs even though the
Destination ID identifies the single CPU I asked for. It seems the
device only takes the new Message Address setting into account when
the MSI Enable bit in the Message Control register is changed from
0 to 1. I tested this by setting the MSI enable bit to 0 and then
immediately back to 1 at the end of
io_apic_64.c:set_msi_irq_affinity().

Is this a permitted behaviour for the device? I couldn't find
anything in the PCI specification that would mentioned it.
The spec says that system software should enable MSI before setting 
message address and data (PCI 3.0 section 6.8.3.1 MSI configuration). 
The kernel doesn't do this.
I don't think that's necessary.  However, the thought occurs that we
ought to disable MSI, then write the address, then re-enable MSI.  It
doesn't cause a problem at the moment because we don't change the
top 32 bits of the address (at least on any of my systems ..) but
theoretically if we were to use a 64-bit address, we would experience
MSIs being sent to an address that was a mixture of the top 32 bits of
the old address and the bottom 32 bits of the new address.

We definitely can already get tearing when we've written the lower
address register but not the data register yet (also true for MSIX, by
the way).  So we ought to fix this properly.
I really don't think we should be enabling/disabling MSI while 
interrupts might be being generated.  There are cases where interrupts 
will be lost.  Consider PCIe where we might end up with a situation 
where MSI is disabled and then enabled sufficiently quickly that no 
periodic line interrupt message is sent by the device.
The message address and data should only be modified while the vector is 
masked (to avoid the aforementioned 'tearing').  This means that setting 
IRQ affinity cannot be done on devices without per-vector mask bits.  I 
don't think this is a problem.
In vague psuedo-code, set_affinity() should be something like this:

int did_mask = msi_mask_vector();
if (!did_mask) {
    return -ENOTSUPP;
}
/* fiddle with address and mask now */
msi_unmask_vector();

David
--
David Vrabel, Software Engineer, Drivers group  Tel: +44 (0)1223 692562
CSR plc, Churchill House, Cambridge Business Park, Cowley Road, CB4 0WZ
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html