Re: Why set .suppress_bind_attrs even though .remove() implemented?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 28, 2022 at 02:17:01PM +0200, Johan Hovold wrote:
> On Wed, Jul 27, 2022 at 02:57:16PM -0500, Bjorn Helgaas wrote:
> > On Tue, Jul 26, 2022 at 11:56:59AM +0200, Johan Hovold wrote:
> > > On Mon, Jul 25, 2022 at 06:35:27PM +0100, Marc Zyngier wrote:
> > > > On Mon, 25 Jul 2022 16:18:48 +0100,
> > > > Johan Hovold <johan@xxxxxxxxxx> wrote:
> > > 
> > > > > Since when is unloading modules something that is expected to
> > > > > work perfectly? I keep hearing "well, don't do that then" when
> > > > > someone complains about unloading this module while doing this
> > > > > or that broke something. (And it's only root that can unload
> > > > > modules in the first place.)
> > > > 
> > > > Well, maybe I have higher standards. For the stuff I maintain, I
> > > > now point-blank refuse to support module unloading if this can
> > > > result in a crash. Or worse.
> > > 
> > > That makes sense for regular interrupt controllers where its hard to
> > > tell that all consumers are gone, but I don't think that should
> > > limit the usefulness of having modular PCI controller drivers where
> > > we know that the consumers are gone after deregistering the bus
> > > (i.e. the consumers are descendants of the controller in the device
> > > tree).
> > 
> > Those consumers are endpoint drivers, so I think this depends on those
> > drivers correctly unmapping the interrupts they use, right?
> 
> Right. For MSI this means that pci_alloc_irq_vectors() in probe should
> be matched by pci_free_irq_vectors() on remove.
> 
> For legacy interrupts, which can be shared, the mapping is created by
> PCI core when binding to the first device and can only be disposed by
> the host-bridge driver once all descendants have been removed.
> 
> The endpoint drivers still need to disable their interrupts of course.
> 
> Buggy endpoint-driver remove implementations can lead to all sorts of
> crashes (e.g. after failing to deregister a class device), and if that's
> a worry then don't unload modules (and possibly disable it completely
> using CONFIG_MODULE_UNLOAD).
> 
> > > > > It's useful for developers, but use it at your own risk.
> > > > > 
> > > > > That said, I agree that if something is next to impossible to
> > > > > get right, as may be the case with interrupt controllers
> > > > > generally, then fine, let's disable module unloading for that
> > > > > class of drivers.
> > > > > 
> > > > > And this would mean disabling driver unbind for the 20+ driver
> > > > > PCI drivers that currently implement it to some degree.
> > > > 
> > > > That would be Bjorn's and Lorenzo's call.
> > > 
> > > Sure, but I think it would be the wrong decision here. Especially,
> > > since the end result will likely just be that more drivers will
> > > become always compiled-in.
> > 
> > Can you elaborate on this?  I think Marc is suggesting that these PCI
> > controller drivers be modular but not removable.  Why would that cause
> > more of them to be compiled-in?
> 
> As mentioned earlier in this thread, we only appear to have some 60
> drivers in the entire tree that bother to try to implement that. I fear
> that blocking the use of modules (including being able to unload them)
> will just make people submit drivers that can only be built in.
> 
> Not everyone cares about Android's GKI, but being able to unload a
> module during development is very useful (and keeping that out-of-tree
> prevents sharing the implementation and make it susceptible to even
> further bit rot).
> 
> So continuing to supporting modules properly is a win for everyone (e.g.
> GKI and developers).
>  
> > > > > > > Turns out the pcie-qcom driver does not support legacy
> > > > > > > interrupts so there's no risk of there being any lingering
> > > > > > > mappings if I understand things correctly.
> > > > > > 
> > > > > > It still does MSIs, thanks to dw_pcie_host_init(). If you can
> > > > > > remove the driver while devices are up and running with MSIs
> > > > > > allocated, things may get ugly if things align the wrong way
> > > > > > (if a driver still has a reference to an irq_desc or irq_data,
> > > > > > for example).
> > > > > 
> > > > > That is precisely the way I've been testing it and everything
> > > > > appears to be tore down as it should.
> > > > >
> > > > > And a PCI driver that has been unbound should have released its
> > > > > resources, or that's a driver bug. Right?
> > > > 
> > > > But that's the thing: you can easily remove part of the
> > > > infrastructure without the endpoint driver even noticing. It may
> > > > not happen in your particular case if removing the RC driver will
> > > > also nuke the endpoints in the process, but I can't see this is an
> > > > absolute guarantee. The crash pointed to by an earlier email is
> > > > symptomatic of it.
> > > 
> > > But that was arguably due to a driver bug, which we know how to fix.
> > > For MSIs the endpoint driver will free its interrupts and all is
> > > good.
> > > 
> > > The key observation is that the driver model will make sure that any
> > > endpoint drivers have been unbound before the bus is deregistered.
> > > 
> > > That means there are no longer any consumers of the interrupts,
> > > which can be disposed. For MSI this is handled by
> > > pci_free_irq_vectors() when unbinding the endpoint drivers. For
> > > legacy interrupts, which can be shared, the PCIe RC driver needs to
> > > manage this itself after the consumers are gone.
> > 
> > The driver model ensures that endpoint drivers have been unbound. But
> > doesn't the interrupt disposal depend on the correct functioning of
> > those endpoint drivers?  So if a buggy endpoint driver failed to
> > dispose of them, we're still vulnerable?
> 
> Just as you are if an endpoint-driver fails to clean up after itself in
> some other way (e.g. leaves the interrupt enabled).
> 

The IRQ disposal issue should hopefully fixed by this series:
https://lore.kernel.org/linux-pci/20240715114854.4792-3-kabel@xxxxxxxxxx/

Then if the dwc driver calls pci_remove_irq_domain() instead of
irq_domain_remove(), we can be sure that all the IRQs are disposed during the
driver remove.

So can we proceed with the series making Qcom driver modular?

- Mani

-- 
மணிவண்ணன் சதாசிவம்




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux