On Fri, 30 Apr 2021 14:19:08 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Fri, Apr 30, 2021 at 02:31:40PM +0200, Cornelia Huck wrote: > > On Thu, 29 Apr 2021 15:13:47 -0300 > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > All the checks for !private need some kind of locking. The driver core > > > model is that the 'struct device_driver' callbacks are all called > > > under the device_lock (this prevents the driver unbinding during the > > > callback). I didn't check if ccs does this or not.. > > > > probe/remove/shutdown are basically a forward of the callbacks at the > > bus level. > > These are all covered by device_lock > > > The css bus should make sure that we serialize > > irq/sch_event/chp_event with probe/remove. > > Hum it doesn't look OK, like here: > > css_process_crw() > css_evaluate_subchannel() > sch = bus_find_device() > -- So we have a refcount on the struct device > css_evaluate_known_subchannel() { > if (sch->driver) { > if (sch->driver->sch_event) > ret = sch->driver->sch_event(sch, slow); > } > > But the above call and touches to sch->driver (which is really just > sch->dev.driver) are unlocked and racy. > > I would hold the device_lock() over all touches to sch->driver outside > of a driver core callback. I think this issue did not come up much before, as most drivers on the css bus tend to stay put during the lifetime of the device; but yes, it seems we're missing some locking. For the css bus, we need locking for the event callbacks; for irq, this may interact with the subchannel lock and likely needs some care. I also looked at the other busses in the common I/O layer: scm looks good at a glance, ccwgroup and ccw have locking for online/offline; the other callbacks for the ccw drivers probably need to take the device lock as well. Common I/O layer maintainers, does that look right?