On Fri, Mar 25, 2022 at 10:51:51AM -0500, Pierre-Louis Bossart wrote: > > > On 3/24/22 06:15, Greg Kroah-Hartman wrote: > > On Wed, Mar 23, 2022 at 02:45:59PM -0500, Pierre-Louis Bossart wrote: > > > Hi, > > > I could use feedback/guidance on a possible conceptual bug in the SoundWire > > > probe and bus handling. > > > > > > When we probe a driver, the code does this: > > > > > > static int sdw_drv_probe(struct device *dev) > > > { > > > struct sdw_slave *slave = dev_to_sdw_dev(dev); > > > struct sdw_driver *drv = drv_to_sdw_driver(dev->driver); > > > const struct sdw_device_id *id; > > > const char *name; > > > int ret; > > > > > > /* > > > * fw description is mandatory to bind > > > */ > > > if (!dev->fwnode) > > > return -ENODEV; > > > > > > if (!IS_ENABLED(CONFIG_ACPI) && !dev->of_node) > > > return -ENODEV; > > > > > > id = sdw_get_device_id(slave, drv); > > > if (!id) > > > return -ENODEV; > > > > > > slave->ops = drv->ops; > > > > That is wrong and should never happen as you lost all reference > > counting. Please don't do that. > > ok, so I think we all agree on the issue. It's not new code, it's been there > since December 2017 and 4.16 It's hard to notice that in code review :( > > > The last line is the problematic one. If at some point, the user does an > > > rmmod and unbinds the SoundWire codec driver, the .remove will be called and > > > the 'drv' will no longer be valid, but we will still have a reference to > > > drv->ops and use that pointer in the bus code, e.g. > > > > > > /* Update the Slave driver */ > > > if (slave_notify && slave->ops && > > > slave->ops->interrupt_callback) { > > > slave_intr.sdca_cascade = sdca_cascade; > > > slave_intr.control_port = clear; > > > memcpy(slave_intr.port, &port_status, > > > sizeof(slave_intr.port)); > > > > > > slave->ops->interrupt_callback(slave, > > > &slave_intr); > > > } > > > > > > I noodled with a potential fix in > > > https://github.com/thesofproject/linux/pull/3534/commits/82d64fb0fd39b532263f060a8ec86e47e9ab305b > > > > > > where I force-reset this slave->ops pointer, but it is likely to be very > > > racy. > > > > Just properly reference count things and this should be ok. You can > > NEVER just save off a pointer to a random thing that you do not own > > without increasing the reference count, otherwise bad things will > > happen. It's always been this way. > > but I am not following the reference count recommendation in that specific > case. If we increase a reference count in the probe, wouldn't that prevent > the unbind/remove? Depends on what you are increasing the reference count of :) bind/unbind has nothing to do with the reference count of objects, it only prevents the devices/whatever from being removed from the system (hopefully.) When doing "give me the driver ops of that driver over there" you have to be VERY careful. It's a very uncommon pattern that I can't think of anyone else doing outside of a bus logic. I don't recommend it, instead grab references to the devices themselves and go through the device and what is bound/unbound to that. Also keep things in sync when you grab those other devices, that can also get messy. How about just not doing this at all?