Re: [PATCH] PCI: Fix devres regression in pci_intx()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2024-09-04 at 06:57 -0600, Alex Williamson wrote:
> On Wed, 04 Sep 2024 09:06:37 +0200
> Philipp Stanner <pstanner@xxxxxxxxxx> wrote:
> 
> > On Tue, 2024-09-03 at 09:44 -0600, Alex Williamson wrote:
> > > On Thu, 25 Jul 2024 14:07:30 +0200
> > > Philipp Stanner <pstanner@xxxxxxxxxx> wrote:
> > >   
> > > > pci_intx() is a function that becomes managed if
> > > > pcim_enable_device()
> > > > has been called in advance. Commit 25216afc9db5 ("PCI: Add
> > > > managed
> > > > pcim_intx()") changed this behavior so that pci_intx() always
> > > > leads
> > > > to
> > > > creation of a separate device resource for itself, whereas
> > > > earlier,
> > > > a
> > > > shared resource was used for all PCI devres operations.
> > > > 
> > > > Unfortunately, pci_intx() seems to be used in some drivers'
> > > > remove()
> > > > paths; in the managed case this causes a device resource to be
> > > > created
> > > > on driver detach.
> > > > 
> > > > Fix the regression by only redirecting pci_intx() to its
> > > > managed
> > > > twin
> > > > pcim_intx() if the pci_command changes.
> > > > 
> > > > Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()")  
> > > 
> > > I'm seeing another issue from this, which is maybe a more general
> > > problem with managed mode.  In my case I'm using vfio-pci to
> > > assign
> > > an
> > > ahci controller to a VM.  
> > 
> > "In my case" doesn't mean OOT, does it? I can't fully follow.
> 
> "OOT" Out Of Tree?  No, "In my case" is simply introducing the
> scenario
> in which I see the issue.  vfio-pci is an in-tree driver used to
> attach
> devices to userspace drivers, such as QEMU.  The ahci driver is
> loaded
> during system boot, setting the is_managed flag.  The ahci driver is
> then unbound from the device and the vfio-pci driver is bound.  The
> vfio-pci driver provides a uAPI for userspace drivers to operate a
> device in an an IOMMU protected context.

Alright, thx for the clarification.

>  
> > >   ahci_init_one() calls pcim_enable_device()
> > > which sets is_managed = true.  I notice that nothing ever sets
> > > is_managed to false.  Therefore now when I call pci_intx() from
> > > vfio-
> > > pci
> > > under spinlock, I get a lockdep warning  
> > 
> > I suppose you see the lockdep warning because the new pcim_intx()
> > can 
> > now allocate, whereas before 25216afc9db5 it was
> > pcim_enable_device()
> > which allocated *everything* related to PCI devres.
> > 
> > >  as I no go through pcim_intx()
> > > code after 25216afc9db5   
> > 
> > You alwas went through pcim_intx()'s logic. The issue seems to be
> > that
> > the allocation step was moved.
> 
> Unintentionally, yes, I believe so.  vfio-pci is not a managed,
> devres
> driver and therefore had no expectation of using the managed code
> path.

Yes, I agree this needs to be fixed through the solution you proposed.

> 
> > > since the previous driver was managed.  
> > 
> > what do you mean by "previous driver"?
> 
> As noted, the ahci driver is first bound to the device at boot,
> unbound, and the vfio-pci driver bound to the device.  The ahci
> driver
> is the previous driver.
> 
> > >   It seems
> > > like we should be setting is_managed to false is the driver
> > > release
> > > path, right?  
> > 
> > So the issue seems to be that the same struct pci_dev can be used
> > by
> > different drivers, is that correct?
> 
> Yes, and more generically, the driver release should undo everything
> that has been configured by the driver probe.
> 
> > If so, I think that can be addressed trough having
> > pcim_disable_device() set is_managed to false as you suggest.
> 
> If that's sufficient and drivers only call pcim_disable_device() in
> their release function.

pcim_disable_device() is not intended to be used directly by drivers.
It's basically the devres callback for pcim_enable_device() and is
called in everyone's release path automatically by devres.
(I agree that the naming is not superbe)

>   I also note that f748a07a0b64 ("PCI: Remove
> legacy pcim_release()") claims that:
> 
>   Thanks to preceding cleanup steps, pcim_release() is now not needed
>   anymore and can be replaced by pcim_disable_device(), which is the
>   exact counterpart to pcim_enable_device().
> 
> However, that's not accurate as pcim_enable_device() adds a devm
> action, unconditionally calls pci_enable_device() and sets is_managed
> to true.

It's not accurate in regards with is_managed.

The rest is fine IMO. The devres callback shall be added, and the
unconditional call to pci_enable_device() is also desired.

>   If we assume pcim_pin_device() is a valid concept, don't we
> still need to remove the devm action as well?

No. As pcim_disable_device() is the very devres callback, it does not
need to remove itself. Devres calls it once and then it's gone.

However, pcim_pin_device() IMO is not a valid concept. Who wants such
behavior IMO shall use pci_enable_device() and pci_disable_device()
manually.
I proposed removing it here:
https://lore.kernel.org/all/20240822073815.12365-2-pstanner@xxxxxxxxxx/

(Note that previously it could not be removed because
pcim_enable_device() also allocated all the stuff needed by
pci_request_region() etc.)

> 
> > Another solution can could at least consider would be to use a
> > GFP_ATOMIC for allocation in get_or_create_intx_devres().
> 
> If we look at what pci_intx() does without devres, it's simply
> reading
> and setting or clearing a bit in config space.  I can attest that a
> driver author would have no expectation that such a function
> allocates
> memory

A driver author would not have any expectation of a function implicitly
doing anything with devres. That's the reason why I did all this work
in the first place, to, ultimately, remove this hybrid behavior from
all pci_ functions.
So that devres functions are clearly marked with pcim_

That is, however, not that easy because everyone who uses
pcim_enable_device() in combination with pci_intx() first has to be
ported to a pci_intx()-version that has nothing to do with devres.


>  and there are scenarios where we want to call this with
> interrupts disabled, such as within an interrupt context.  So, TBH,
> it
> might make sense to consider whether an allocation in this path is
> appropriate at all, but I'm obviously no expert in devres.

The entire hybrid nature from pci_intx() should be removed.
I'm working on that.

The hybrid nature was always there, but the allocation was not. It
would be removed later together with the hybrid devres usage.

> 
> > I suppose your solution is the better one, though.
> 
> I see you've posted a patch, I'll test it as soon as I'm able. 

If it works from what I understand that should solve those issues for
now until we can phase out pci_intx() usage that relies on the device
resource.

---
btw. I just looked into the old code and found that this one also
already had a similar half-bug with is_managed. It never sets it to
false again, but pci_intx() runs into:

static struct pci_devres *find_pci_dr(struct pci_dev *pdev)
{
	if (pci_is_managed(pdev))
		return devres_find(&pdev->dev, pcim_release, NULL,
NULL);
	return NULL;
}


So in your case pci_is_managed() would have also been true and the only
reason no problem occurred is that devres_find() doesn't find the
device resource of the previous driver anymore, so pci_intx() won't use
that memory.
---


Thanks for debugging,
P.

> Thanks,
> 
> Alex
> 






[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux