On Wed, 2024-09-04 at 06:57 -0600, Alex Williamson wrote: > On Wed, 04 Sep 2024 09:06:37 +0200 > Philipp Stanner <pstanner@xxxxxxxxxx> wrote: > > > On Tue, 2024-09-03 at 09:44 -0600, Alex Williamson wrote: > > > On Thu, 25 Jul 2024 14:07:30 +0200 > > > Philipp Stanner <pstanner@xxxxxxxxxx> wrote: > > > > > > > pci_intx() is a function that becomes managed if > > > > pcim_enable_device() > > > > has been called in advance. Commit 25216afc9db5 ("PCI: Add > > > > managed > > > > pcim_intx()") changed this behavior so that pci_intx() always > > > > leads > > > > to > > > > creation of a separate device resource for itself, whereas > > > > earlier, > > > > a > > > > shared resource was used for all PCI devres operations. > > > > > > > > Unfortunately, pci_intx() seems to be used in some drivers' > > > > remove() > > > > paths; in the managed case this causes a device resource to be > > > > created > > > > on driver detach. > > > > > > > > Fix the regression by only redirecting pci_intx() to its > > > > managed > > > > twin > > > > pcim_intx() if the pci_command changes. > > > > > > > > Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()") > > > > > > I'm seeing another issue from this, which is maybe a more general > > > problem with managed mode. In my case I'm using vfio-pci to > > > assign > > > an > > > ahci controller to a VM. > > > > "In my case" doesn't mean OOT, does it? I can't fully follow. > > "OOT" Out Of Tree? No, "In my case" is simply introducing the > scenario > in which I see the issue. vfio-pci is an in-tree driver used to > attach > devices to userspace drivers, such as QEMU. The ahci driver is > loaded > during system boot, setting the is_managed flag. The ahci driver is > then unbound from the device and the vfio-pci driver is bound. The > vfio-pci driver provides a uAPI for userspace drivers to operate a > device in an an IOMMU protected context. Alright, thx for the clarification. > > > > ahci_init_one() calls pcim_enable_device() > > > which sets is_managed = true. I notice that nothing ever sets > > > is_managed to false. Therefore now when I call pci_intx() from > > > vfio- > > > pci > > > under spinlock, I get a lockdep warning > > > > I suppose you see the lockdep warning because the new pcim_intx() > > can > > now allocate, whereas before 25216afc9db5 it was > > pcim_enable_device() > > which allocated *everything* related to PCI devres. > > > > > as I no go through pcim_intx() > > > code after 25216afc9db5 > > > > You alwas went through pcim_intx()'s logic. The issue seems to be > > that > > the allocation step was moved. > > Unintentionally, yes, I believe so. vfio-pci is not a managed, > devres > driver and therefore had no expectation of using the managed code > path. Yes, I agree this needs to be fixed through the solution you proposed. > > > > since the previous driver was managed. > > > > what do you mean by "previous driver"? > > As noted, the ahci driver is first bound to the device at boot, > unbound, and the vfio-pci driver bound to the device. The ahci > driver > is the previous driver. > > > > It seems > > > like we should be setting is_managed to false is the driver > > > release > > > path, right? > > > > So the issue seems to be that the same struct pci_dev can be used > > by > > different drivers, is that correct? > > Yes, and more generically, the driver release should undo everything > that has been configured by the driver probe. > > > If so, I think that can be addressed trough having > > pcim_disable_device() set is_managed to false as you suggest. > > If that's sufficient and drivers only call pcim_disable_device() in > their release function. pcim_disable_device() is not intended to be used directly by drivers. It's basically the devres callback for pcim_enable_device() and is called in everyone's release path automatically by devres. (I agree that the naming is not superbe) > I also note that f748a07a0b64 ("PCI: Remove > legacy pcim_release()") claims that: > > Thanks to preceding cleanup steps, pcim_release() is now not needed > anymore and can be replaced by pcim_disable_device(), which is the > exact counterpart to pcim_enable_device(). > > However, that's not accurate as pcim_enable_device() adds a devm > action, unconditionally calls pci_enable_device() and sets is_managed > to true. It's not accurate in regards with is_managed. The rest is fine IMO. The devres callback shall be added, and the unconditional call to pci_enable_device() is also desired. > If we assume pcim_pin_device() is a valid concept, don't we > still need to remove the devm action as well? No. As pcim_disable_device() is the very devres callback, it does not need to remove itself. Devres calls it once and then it's gone. However, pcim_pin_device() IMO is not a valid concept. Who wants such behavior IMO shall use pci_enable_device() and pci_disable_device() manually. I proposed removing it here: https://lore.kernel.org/all/20240822073815.12365-2-pstanner@xxxxxxxxxx/ (Note that previously it could not be removed because pcim_enable_device() also allocated all the stuff needed by pci_request_region() etc.) > > > Another solution can could at least consider would be to use a > > GFP_ATOMIC for allocation in get_or_create_intx_devres(). > > If we look at what pci_intx() does without devres, it's simply > reading > and setting or clearing a bit in config space. I can attest that a > driver author would have no expectation that such a function > allocates > memory A driver author would not have any expectation of a function implicitly doing anything with devres. That's the reason why I did all this work in the first place, to, ultimately, remove this hybrid behavior from all pci_ functions. So that devres functions are clearly marked with pcim_ That is, however, not that easy because everyone who uses pcim_enable_device() in combination with pci_intx() first has to be ported to a pci_intx()-version that has nothing to do with devres. > and there are scenarios where we want to call this with > interrupts disabled, such as within an interrupt context. So, TBH, > it > might make sense to consider whether an allocation in this path is > appropriate at all, but I'm obviously no expert in devres. The entire hybrid nature from pci_intx() should be removed. I'm working on that. The hybrid nature was always there, but the allocation was not. It would be removed later together with the hybrid devres usage. > > > I suppose your solution is the better one, though. > > I see you've posted a patch, I'll test it as soon as I'm able. If it works from what I understand that should solve those issues for now until we can phase out pci_intx() usage that relies on the device resource. --- btw. I just looked into the old code and found that this one also already had a similar half-bug with is_managed. It never sets it to false again, but pci_intx() runs into: static struct pci_devres *find_pci_dr(struct pci_dev *pdev) { if (pci_is_managed(pdev)) return devres_find(&pdev->dev, pcim_release, NULL, NULL); return NULL; } So in your case pci_is_managed() would have also been true and the only reason no problem occurred is that devres_find() doesn't find the device resource of the previous driver anymore, so pci_intx() won't use that memory. --- Thanks for debugging, P. > Thanks, > > Alex >