On Wednesday 24 September 2008 12:34:30 Matthew Wilcox wrote: > On Wed, Sep 24, 2008 at 11:26:41AM +0800, Yang, Sheng wrote: > > On Wednesday 24 September 2008 03:16:40 Matthew Wilcox wrote: > > > /** > > > * pcie_do_flr() - execute Function Level Reset > > > * @dev: Device to reset > > > * > > > * Function Level Reset is a feature introduced by the PCIe spec > > > version * XXX. It allows one function of a multifunction PCI device to > > > be * reset without affecting the other functions in the same device. * > > > Resetting the device will make the contents of PCI configuration space > > > * random, so any caller of this must be prepared to reinitialise the * > > > device including MSI, bus mastering, BARs, decoding IO and memory > > > spaces, * etc. > > > * > > > * Returns 0 if the device was successfully reset or an error code if > > > * the device doesn't support FLR or we failed to reset the device > > > * properly. > > > */ > > > > Matthew, thanks to be so detail... > > You're welcome. I wonder if perhaps the function would be better named > pci_reset_function() or pci_function_reset(). I also notice that I > didn't implement what I documented above wrt the reset failing. OK, I will add the check for completion. > > > > +int pcie_do_flr(struct pci_dev *dev) > > > > +{ > > > > + u16 status; > > > > + u32 cap; > > > > + int exppos = pci_find_capability(dev, PCI_CAP_ID_EXP); > > > > + > > > > + if (!exppos) > > > > + return -ENOTSUPP; > > > > > > return -ENOTTY; > > > > > > > + pci_read_config_dword(dev, exppos + PCI_EXP_DEVCAP, &cap); > > > > + if (!(cap & PCI_EXP_DEVCAP_FLR)) > > > > + return -ENOTSUPP; > > > > > > return -ENOTTY; > > > > I will modify this. But I still don't understand why -ENOTTY... I think > > there are some history about it? > > Yes. ENOTTY means "This operation cannot be performed on this device". > EOPNOTSUPP is some networking error, I forget the details now. > > > > > + > > > > + pci_block_user_cfg_access(dev); > > > > + > > > > + /* Clear entire Command register */ > > > > + pci_write_config_word(dev, PCI_COMMAND, 0); > > > > + pci_write_config_word(dev, exppos + PCI_EXP_DEVCTL, 0); > > > > > > Is this wise? Clearing the 'disable INTx' bit (10) might cause the > > > device to generate interrupts on its interrupt pin which could cause > > > lockups. I'm also not sure we need to clear any of the bits in > > > PCI_EXP_DEVCTL -- can you explain? > > > > > >From the implemention note of FLR, "Software clears the entire Command > > > > register, disabling the Function from issuing any new Requests." So here > > we are. > > Ah yes, it was on page 384 so I didn't notice it. > > I think this implementation note is incorrect in that clearing all the > bits does not prevent the device from generating INTx interrupts. Might > be worth raising with the PCI committee. Feel a little strange... This kind of obvious issue should be expected to raise quickly... I will mail them for the reason. > > This implementation note also says, "Software that's performing > the FLR synchronizes with other software that might potentially access > the Function directly, and ensures such accesses do not occur during > this algorithm." > > If we have a shared interrupt and the device being reset still has a > handler attached, it's entirely possible that another device will > generate an interrupt and cause the interrupt handler for this device to > try to access the device during the FLR. > > We could fix this by temporarily unregistering the interrupt handler for > this device, or calling disable_irq(). I think unregister interrupt handler is better, after we set the interrupt disable bit in control register. > > And from PCI spec 6.2.2. Device Control > > > > "The Command register provides coarse control over a device's ability to > > generate and respond to PCI cycles. When a 0 is written to this register, > > the device is logically disconnected from the PCI bus for all accesses > > except configuration accesses. " > > To be fair, that's talking about Conventional PCI which doesn't transmit > INTx over the PCI bus, they're on sideband signals. Thanks for point it out... > > > > > + dev_info(&dev->dev, "trasaction pending when doing FLR, > > > > " + "waiting for another 5 seconds\n"); > > > > > > I don't ever want to see 'FLR' in a message the user might see. If you > > > have to, spell it out as 'Function Level Reset' or just say "resetting > > > function" or something. We use far too many acronyms as it is. What > > > was wrong with the text I proposed in my first review? > > > > Um... I just think that message is too ambiguous to know the exactly > > reason, e.g. "still busy", user can ask "what are you waiting for?" So I > > want to show what the function is doing here, and didn't aware the > > acronyms problem... > > > > What's about "Still busy after 100ms when resetting function. Waiting 5 > > seconds\n"? > > Looks fine. Maybe we can be more pithy, how about: > "Function reset incomplete after 100ms, sleeping for 5 seconds\n" > ? OK, that's enough for user. :) > > > > Let's talk about what else we might need to do here. The spec says: > > > > > > Any outstanding INTx interrupt asserted by the Function must be > > > de-asserted by sending the corresponding Deassert_INTx Message prior > > > to starting the FLR. > > > > > > So we need to make sure there are no pending interrupts for this > > > function before issuing the FLR. Is that the responsibility of the > > > caller? We should say so if it is. If we're going to do it in this > > > function, we should use disable_irq() to be sure that the interrupt > > > isn't currently executing on some other CPU (having first checked that > > > the device isn't using MSI-X, because if it is, we don't use the > > > interrupt in pdev->irq). If we're using a shared interrupt, other > > > users are going to suffer. > > > > > > I still don't really understand how this flr fits in with every other > > > part of the system. Is the device supposed to be quiesced before this > > > is called? If so, a lot of what it does is pointless. Or is it > > > supposed to do the quiescing? If so, it's missing some bits. > > > > I think the device didn't suppose to be completely quiesced before doing > > flr. The spec also said > > > > "Note that the controls that enable the Function to initiate requests on > > PCI Express are cleared, including Bus Master Enable, MSI Enable, and the > > like, effectively causing the Function to *become quiescent* on the > > Link." > > That paragraph is describing the effect of the FLR on the function, not > what the software stack does first. Yeah, that's what I means... > > > So I think the caller should take care of INTx, and this routine cover > > others. If the interrupts are shared, seems we can't prevent other user > > from suffering a little. But luckily, the most modern device which need > > to be reset should have used MSI/MSI-X, so I think it's reasonable. > > They ought to use MSIs, yes. I fear that they won't. > > > But what's the missing bits your mentioned? > > Things like the interrupts, the MSI/MSI-X state, the BARs ... looking at > it again, there's pci_save_state() and pci_restore_state() which should > do the trick. > > The use cases mentioned in the PCIe spec imply that there are different > use cases. Ones where the driver has detached from the device in an > orderly manner already (the 'partitioned system' example), and error > handling cases (where we might still have a driver attached). I almost > wonder if we want two functions, one that assumes the device is already > quiesced and just goes ahead and resets it, and another that knows it > has to do all the locking, disabling, etc. Seems it's reasonable. I will try to update patch to two functions. > By the way, what happens if the admin loads a module that wants to probe > this device while it's being reset? It seems like it might well send > configuration cycles to the card, or even mmio cycles. I almost wonder > if we shouldn't remove the device from the system while its being reset, > then re-add it after the reset has completed. It's unbelievable that the admin can catch at that small window... Yeah, just kidding, this can't be a excuse. How about add a lock on probing? And seems something like pci_set_power_state also got the similar problem? -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html