On 5/3/21 12:54 PM, Cornelia Huck wrote:
On Fri, 30 Apr 2021 14:19:08 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
On Fri, Apr 30, 2021 at 02:31:40PM +0200, Cornelia Huck wrote:
On Thu, 29 Apr 2021 15:13:47 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
All the checks for !private need some kind of locking. The driver core
model is that the 'struct device_driver' callbacks are all called
under the device_lock (this prevents the driver unbinding during the
callback). I didn't check if ccs does this or not..
probe/remove/shutdown are basically a forward of the callbacks at the
bus level.
These are all covered by device_lock
The css bus should make sure that we serialize
irq/sch_event/chp_event with probe/remove.
Hum it doesn't look OK, like here:
css_process_crw()
css_evaluate_subchannel()
sch = bus_find_device()
-- So we have a refcount on the struct device
css_evaluate_known_subchannel() {
if (sch->driver) {
if (sch->driver->sch_event)
ret = sch->driver->sch_event(sch, slow);
}
But the above call and touches to sch->driver (which is really just
sch->dev.driver) are unlocked and racy.
I would hold the device_lock() over all touches to sch->driver outside
of a driver core callback.
I think this issue did not come up much before, as most drivers on the
css bus tend to stay put during the lifetime of the device; but yes, it
seems we're missing some locking.
For the css bus, we need locking for the event callbacks; for irq, this
may interact with the subchannel lock and likely needs some care.
I also looked at the other busses in the common I/O layer: scm looks
good at a glance, ccwgroup and ccw have locking for online/offline; the
other callbacks for the ccw drivers probably need to take the device
lock as well.
Common I/O layer maintainers, does that look right?
I just had a quick glance on the CIO layer drivers. And at first look,
you are right.
It looks likewe need modifications in the event callbacks (referring css
here)
Let me go thoughthis thoroughly and update.
Thank you.