On 9/5/24 16:13, Philipp Stanner wrote: > On Thu, 2024-09-05 at 09:33 +0900, Damien Le Moal wrote: >> On 2024/09/05 6:10, Alex Williamson wrote: >>> On Wed, 4 Sep 2024 23:24:53 +0300 >>> Andy Shevchenko <andy.shevchenko@xxxxxxxxx> wrote: >>> >>>> Wed, Sep 04, 2024 at 12:07:21PM -0600, Alex Williamson kirjoitti: >>>>> On Wed, 04 Sep 2024 15:37:25 +0200 >>>>> Philipp Stanner <pstanner@xxxxxxxxxx> wrote: >>>>>> On Wed, 2024-09-04 at 17:25 +0900, Damien Le Moal wrote: >>>> >>>> ... >>>> >>>>>> If vfio-pci can get rid of pci_intx() alltogether, that might >>>>>> be a good >>>>>> thing. As far as I understood Andy Shevchenko, pci_intx() is >>>>>> outdated. >>>>>> There's only a hand full of users anyways. >>>>> >>>>> What's the alternative? >>>> >>>> From API perspective the pci_alloc_irq_vectors() & Co should be >>>> used. >>> >>> We can't replace a device level INTx control with a vector >>> allocation >>> function. >>> >>>>> vfio-pci has a potentially unique requirement >>>>> here, we don't know how to handle the device interrupt, we only >>>>> forward >>>>> it to the userspace driver. As a level triggered interrupt, >>>>> INTx will >>>>> continue to assert until that userspace driver handles the >>>>> device. >>>>> That's obviously unacceptable from a host perspective, so INTx >>>>> is >>>>> masked at the device via pci_intx() where available, or at the >>>>> interrupt controller otherwise. The API with the userspace >>>>> driver >>>>> requires that driver to unmask the interrupt, again resulting >>>>> in a call >>>>> to pci_intx() or unmasking the interrupt controller, in order >>>>> to receive >>>>> further interrupts from the device. Thanks, >>>> >>>> I briefly read the discussion and if I understand it correctly >>>> the problem here >>>> is in the flow: when the above mentioned API is being called. >>>> Hence it's design >>>> (or architectural) level of issue and changing call from foo() to >>>> bar() won't >>>> magically make problem go away. But I might be mistaken. >>> >>> Certainly from a vector allocation standpoint we can change to >>> whatever >>> is preferred, but the direct INTx manipulation functions are a >>> different thing entirely and afaik there's nothing else that can >>> replace them at a low level, nor can we just get rid of our calls >>> to >>> pci_intx(). Thanks, >> >> But can these calls be moved out of the spinlock context ? If not, >> then we need >> to clarify that pci_intx() can be called from any context, which will >> require >> changing to a GFP_ATOMIC for the resource allocation, even if the use >> case >> cannot trigger the allocation. This is needed to ensure the >> correctness of the >> pci_intx() function use. > > We could do that I guess. As I keep saying, it's not intended to have > pci_intx() allocate _permanently_. This is a temporary situation. > pci_intx() should have neither devres nor allocation. > >> Frankly, I am surprised that the might sleep splat you >> got was not already reported before (fuzzying, static analyzers might >> eventually >> catch that though). > > It's a super rare situation: > * pci_intx() has very few callers > * It only allocates if pcim_enable_device() instead of > pci_enable_device() ran. > * It only allocates when it's called for the FIRST TIME > * All of the above is only a problem while you hold a lock > >> >> The other solution would be a version of pci_intx() that has a gfp >> flags >> argument to allow callers to use the right gfp flags for the call >> context. > > I don't think that's a good idea. As I said, I want to clean up all > that in the mid term. > > As a matter of fact, there is already __pcim_intx() in pci/devres.c as > a pure unmanaged pci_intx() as a means to split and then cleanup the > APIs. Yeah. That naming is in fact confusing. __pcim_intx() should really be named pci_intx()... > One path towards getting the hybrid behavior out of pci_intx() could be > to rename __pcim_intx() to pci_intx_unmanaged() and port everyone who > uses pci_enable_device() + pci_intx() to that version. That would be > better than to have a third version with a gfp_t argument. Sounds good. But ideally, all users that rely on the managed variant should be converted to use pcim_intx() and pci_intx() changed to not call in devres. But that may be too much code churn (I have not checked). -- Damien Le Moal Western Digital Research