On Tue, 21 Sep 2021 15:31:03 +0200 Vineeth Vijayan <vneethv@xxxxxxxxxxxxx> wrote: > On Tue, 2021-09-21 at 05:25 +0200, Halil Pasic wrote: > > On Mon, 20 Sep 2021 12:07:23 +0200 > > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > > > On Mon, Sep 20 2021, Vineeth Vijayan <vneethv@xxxxxxxxxxxxx> wrote: > > > > > > > On Mon, 2021-09-20 at 00:39 +0200, Halil Pasic wrote: > > > > > On Fri, 17 Sep 2021 10:40:20 +0200 > > > > > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > > > > > > > ...snip... > > > > > > > Thanks, if I find time for it, I will try to understand > > > > > > > this > > > > > > > better and > > > > > > > come back with my findings. > > > > > > > > > > > > > > > > * Can virtio_ccw_remove() get called while !cdev- > > > > > > > > > >online and > > > > > > > > > virtio_ccw_online() is running on a different cpu? If > > > > > > > > > yes, > > > > > > > > > what would > > > > > > > > > happen then? > > > > > > > > > > > > > > > > All of the remove/online/... etc. callbacks are invoked > > > > > > > > via the > > > > > > > > ccw bus > > > > > > > > code. We have to trust that it gets it correct :) (Or > > > > > > > > have the > > > > > > > > common > > > > > > > > I/O layer maintainers double-check it.) > > > > > > > > > > > > > > > > > > > > > > Vineeth, what is your take on this? Are the struct > > > > > > > ccw_driver > > > > > > > virtio_ccw_remove and the virtio_ccw_online callbacks > > > > > > > mutually > > > > > > > exclusive. Please notice that we may initiate the onlining > > > > > > > by > > > > > > > calling ccw_device_set_online() from a workqueue. > > > > > > > > > > > > > > @Conny: I'm not sure what is your definition of 'it gets it > > > > > > > correct'... > > > > > > > I doubt CIO can make things 100% foolproof in this > > > > > > > area. > > > > > > > > > > > > Not 100% foolproof, but "don't online a device that is in the > > > > > > progress > > > > > > of going away" seems pretty basic to me. > > > > > > > > > > > > > > > > I hope Vineeth will chime in on this. > > > > Considering the online/offline processing, > > > > The ccw_device_set_offline function or the online/offline is > > > > handled > > > > inside device_lock. Also, the online_store function takes care of > > > > avoiding multiple online/offline processing. > > > > > > > > Now, when we consider the unconditional remove of the device, > > > > I am not familiar with the virtio_ccw driver. My assumptions are > > > > based > > > > on how CIO/dasd drivers works. If i understand correctly, the > > > > dasd > > > > driver sets different flags to make sure that a device_open is > > > > getting > > > > prevented while the the device is in progress of offline-ing. > > > > > > Hm, if we are invoking the online/offline callbacks under the > > > device > > > lock already, > > > > I believe we have a misunderstanding here. I believe that Vineeth is > > trying to tell us, that online_store_handle_offline() and > > online_store_handle_offline() are called under the a device lock of > > the ccw device. Right, Vineeth? > Yes. I wanted to bring-out both the scenario.The set_offline/_online() > calls and the unconditional-remove call. I don't understand the paragraph above. I can't map the terms set_offline/_online() and unconditional-remove call to chunks of code. :( > For the set_online The virtio_ccw_online() also invoked with ccwlock > held. (ref: ccw_device_set_online) I don't think virtio_ccw_online() is invoked with the ccwlock held. I think we call virtio_ccw_online() in this line: https://elixir.bootlin.com/linux/v5.15-rc2/source/drivers/s390/cio/device.c#L394 and we have released the cdev->ccwlock literally 2 lines above. > > > > Conny, I believe, by online/offline callbacks, you mean > > virtio_ccw_online() and virtio_ccw_offline(), right? > > > > But the thing is that virtio_ccw_online() may get called (and is > > typically called, AFAICT) with no locks held via: > > virtio_ccw_probe() --> async_schedule(virtio_ccw_auto_online, cdev) > > -*-> virtio_ccw_auto_online(cdev) --> ccw_device_set_online(cdev) --> > > virtio_ccw_online() > > > > Furthermore after a closer look, I believe because we don't take > > a reference to the cdev in probe, we may get virtio_ccw_auto_online() > > called with an invalid pointer (the pointer is guaranteed to be valid > > in probe, but because of async we have no guarantee that it will be > > called in the context of probe). > > > > Shouldn't we take a reference to the cdev in probe? > We just had a quick look at the virtio_ccw_probe() function. > Did you mean to have a get_device() during the probe() and put_device() > just after the virtio_ccw_auto_online() ? Yes, that would ensure that cdev pointer is still valid when virtio_ccw_auto_online() is executed, and that things are cleaned up properly, I guess. But I'm not 100% sure about all the interactions. AFAIR ccw_device_set_online(cdev) would bail out if !drv. But then we have the case where we already assigned it to a new driver (e.g. vfio for dasd). BTW I believe if we have a problem here, the dasd driver has the same problem as well. The code looks very, very similar. And shouldn't this auto-online be common CIO functionality? What is the reason the char devices don't seem to have it? Regards, Halil