Re: [PATCH] PCI: Enable Function Level Reset(FLR) for PCI-E

Matthew Wilcox <matthew@xxxxxx> · Tue, 23 Sep 2008 22:34:30 -0600

On Wed, Sep 24, 2008 at 11:26:41AM +0800, Yang, Sheng wrote:
> On Wednesday 24 September 2008 03:16:40 Matthew Wilcox wrote:
> > /**
> >  * pcie_do_flr() - execute Function Level Reset
> >  * @dev: Device to reset
> >  *
> >  * Function Level Reset is a feature introduced by the PCIe spec version
> >  * XXX.  It allows one function of a multifunction PCI device to be
> >  * reset without affecting the other functions in the same device.
> >  * Resetting the device will make the contents of PCI configuration space
> >  * random, so any caller of this must be prepared to reinitialise the
> >  * device including MSI, bus mastering, BARs, decoding IO and memory
> > spaces, * etc.
> >  *
> >  * Returns 0 if the device was successfully reset or an error code if
> >  * the device doesn't support FLR or we failed to reset the device
> >  * properly.
> >  */
> 
> Matthew, thanks to be so detail...

You're welcome.  I wonder if perhaps the function would be better named
pci_reset_function() or pci_function_reset().  I also notice that I
didn't implement what I documented above wrt the reset failing.

> > > +int pcie_do_flr(struct pci_dev *dev)
> > > +{
> > > +     u16 status;
> > > +     u32 cap;
> > > +     int exppos = pci_find_capability(dev, PCI_CAP_ID_EXP);
> > > +
> > > +     if (!exppos)
> > > +             return -ENOTSUPP;
> >
> >                 return -ENOTTY;
> >
> > > +     pci_read_config_dword(dev, exppos + PCI_EXP_DEVCAP, &cap);
> > > +     if (!(cap & PCI_EXP_DEVCAP_FLR))
> > > +             return -ENOTSUPP;
> >
> >                 return -ENOTTY;
> 
> I will modify this. But I still don't understand why -ENOTTY... I think there 
> are some history about it?

Yes.  ENOTTY means "This operation cannot be performed on this device".
EOPNOTSUPP is some networking error, I forget the details now.

> > > +
> > > +     pci_block_user_cfg_access(dev);
> > > +
> > > +     /* Clear entire Command register */
> > > +     pci_write_config_word(dev, PCI_COMMAND, 0);
> > > +     pci_write_config_word(dev, exppos + PCI_EXP_DEVCTL, 0);
> >
> > Is this wise?  Clearing the 'disable INTx' bit (10) might cause the
> > device to generate interrupts on its interrupt pin which could cause
> > lockups.  I'm also not sure we need to clear any of the bits in
> > PCI_EXP_DEVCTL -- can you explain?
> 
> >From the implemention note of FLR, "Software clears the entire Command 
> register, disabling the Function from issuing any new Requests." So here we 
> are. 

Ah yes, it was on page 384 so I didn't notice it.

I think this implementation note is incorrect in that clearing all the
bits does not prevent the device from generating INTx interrupts.  Might
be worth raising with the PCI committee.

This implementation note also says, "Software that's performing
the FLR synchronizes with other software that might potentially access
the Function directly, and ensures such accesses do not occur during
this algorithm."

If we have a shared interrupt and the device being reset still has a
handler attached, it's entirely possible that another device will
generate an interrupt and cause the interrupt handler for this device to
try to access the device during the FLR.

We could fix this by temporarily unregistering the interrupt handler for
this device, or calling disable_irq().

> And from PCI spec 6.2.2. Device Control
> 
> "The Command register provides coarse control over a device's ability to 
> generate and respond to PCI cycles. When a 0 is written to this register, the 
> device is logically disconnected from the PCI bus for all accesses except 
> configuration accesses. "

To be fair, that's talking about Conventional PCI which doesn't transmit
INTx over the PCI bus, they're on sideband signals.

> > > +             dev_info(&dev->dev, "trasaction pending when doing FLR, "
> > > +                     "waiting for another 5 seconds\n");
> >
> > I don't ever want to see 'FLR' in a message the user might see.  If you
> > have to, spell it out as 'Function Level Reset' or just say "resetting
> > function" or something.  We use far too many acronyms as it is.  What
> > was wrong with the text I proposed in my first review?
> 
> Um... I just think that message is too ambiguous to know the exactly reason, 
> e.g. "still busy", user can ask "what are you waiting for?" So I want to show 
> what the function is doing here, and didn't aware the acronyms problem...
> 
> What's about "Still busy after 100ms when resetting function. Waiting 5 
> seconds\n"?

Looks fine.  Maybe we can be more pithy, how about:
"Function reset incomplete after 100ms, sleeping for 5 seconds\n"
?

> > Let's talk about what else we might need to do here.  The spec says:
> >
> >    Any outstanding INTx interrupt asserted by the Function must be
> >    de-asserted by sending the corresponding Deassert_INTx Message prior
> >    to starting the FLR.
> >
> > So we need to make sure there are no pending interrupts for this
> > function before issuing the FLR.  Is that the responsibility of the
> > caller?  We should say so if it is.  If we're going to do it in this
> > function, we should use disable_irq() to be sure that the interrupt
> > isn't currently executing on some other CPU (having first checked that
> > the device isn't using MSI-X, because if it is, we don't use the
> > interrupt in pdev->irq).  If we're using a shared interrupt, other users
> > are going to suffer.
> >
> > I still don't really understand how this flr fits in with every other
> > part of the system.  Is the device supposed to be quiesced before this
> > is called?  If so, a lot of what it does is pointless.  Or is it
> > supposed to do the quiescing?  If so, it's missing some bits.
> 
> I think the device didn't suppose to be completely quiesced before doing flr. 
> The spec also said
> 
> "Note that the controls that enable the Function to initiate requests on PCI 
> Express are cleared, including Bus Master Enable, MSI Enable, and the like, 
> effectively causing the Function to *become quiescent* on the Link."

That paragraph is describing the effect of the FLR on the function, not
what the software stack does first.

> So I think the caller should take care of INTx, and this routine cover others. 
> If the interrupts are shared, seems we can't prevent other user from 
> suffering a little. But luckily, the most modern device which need to be 
> reset should have used MSI/MSI-X, so I think it's reasonable.

They ought to use MSIs, yes.  I fear that they won't.

> But what's the missing bits your mentioned?

Things like the interrupts, the MSI/MSI-X state, the BARs ... looking at
it again, there's pci_save_state() and pci_restore_state() which should
do the trick.

The use cases mentioned in the PCIe spec imply that there are different
use cases.  Ones where the driver has detached from the device in an
orderly manner already (the 'partitioned system' example), and error
handling cases (where we might still have a driver attached).  I almost
wonder if we want two functions, one that assumes the device is already
quiesced and just goes ahead and resets it, and another that knows it
has to do all the locking, disabling, etc.

By the way, what happens if the admin loads a module that wants to probe
this device while it's being reset?  It seems like it might well send
configuration cycles to the card, or even mmio cycles.  I almost wonder
if we shouldn't remove the device from the system while its being reset,
then re-add it after the reset has completed.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html