On Wed, 23 May 2018 14:29:28 +0200 Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > On 05/23/2018 10:56 AM, Cornelia Huck wrote: > > On Tue, 22 May 2018 12:38:29 -0600 > > Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > > >> On Tue, 22 May 2018 19:17:07 +0200 > >> Halil Pasic <pasic@xxxxxxxxxxxxx> wrote: > >> > >>> From vfio-ccw perspective I join Connie's assessment: vfio-ccw should > >>> be fine with these changes. I'm however not too deeply involved with > >>> the mdev framework, thus I don't feel comfortable r-b-ing. That results > >>> in > >>> Acked-by: Halil Pasic <pasic@xxxxxxxxxxxxx> > >>> for both patches. > >>> > >>> While at it I have would like to ask about the semantics and intended > >>> use of the mdev interfaces. > >>> > >>> static int vfio_ccw_sch_probe(struct subchannel *sch) > >>> { > >>> > >>> /* HALIL: 8< Not so interesting stuff happens here. >8 */ > >> > >> This was interesting: > >> > >> private->state = VFIO_CCW_STATE_NOT_OPER; > >> > >>> ret = vfio_ccw_mdev_reg(sch); > >>> if (ret) > >>> goto out_disable; > >>> /* > >>> * HALIL: > >>> * This might be racy. Somewhere in vfio_ccw_mdev_reg() the create attribute > >>> * is made available (it calls mdev_register_device()). For instance create will > >>> * attempt to decrement private->avail which is initialized below. I fail to > >>> * understand how is this well synchronized. > >>> */ > >>> INIT_WORK(&private->io_work, vfio_ccw_sch_io_todo); > >>> atomic_set(&private->avail, 1); > >>> private->state = VFIO_CCW_STATE_STANDBY; > >>> > >>> return 0; > >>> > >>> out_disable: > >>> cio_disable_subchannel(sch); > >>> out_free: > >>> dev_set_drvdata(&sch->dev, NULL); > >>> kfree(private); > >>> return ret; > >>> } > >>> > >>> Should not initialization of go before mdev_register_device(), and then rolled > >>> back if necessary if mdev_register_device() fails? > >>> > >>> In practice it does not seem very likely that userspace can trigger > >>> mdev_device_create() before vfio_ccw_sch_probe() finishes so it should > >>> not be a practical problem. But I would like to understand how synchronization > >>> is supposed to work. > >>> > >>> [Added Dong Jia, maybe he is also able to answer my question.] > >> > >> vfio_ccw_mdev_create() requires that private->state is not > >> VFIO_CCW_STATE_NOT_OPER but vfio_ccw_sch_probe() explicitly sets state > >> to this value before calling vfio_ccw_mdev_reg(), so a create should > >> return -ENODEV if racing with parent registration. Is there something > >> else that I'm missing? Thanks, > >> > > > Disclaimer: I did not do much kernel work up until now. I still have > much to learn. > > I mostly agree with your analysis but I'm not sure if the conclusion should be > 'and thus everything is good' or 'and thus indeed we do have a race, a > poorly handled one'. Let me throw in that there is more than one way to handle a race, and one of them is to return an error if something happens at an inconvenient time :) > > One thing I'm not sure about is: can atomic_set(&private->avail, 1) and > private->state = VFIO_CCW_STATE_STANDBY be perceived as reordered by > e.g. some other cpu and thus vfio_ccw_mdev_create() or not. I tried to > figure it out based on Documentation/atomic_t.txt but was not very successful. > If these can be reordered we could observe -EPERM instead of -ENODEV, I > think. I don't think that matters (see below). > > Furthermore from your analysis I deduce that the client code (I think mdev > calls it vendor code) may rely on mdev_register_device() containing a > (RELEASE) barrier. We use a mutex in there so the barrier is there. And > the client code may rely on a (ACQUIRE) barrier before the create callback > is called. That should also be true and was true in the past too again because > of mutex usage. > > > >> Alex > > > > No, I think your understanding is correct. We move the state from > > NOT_OPER to STANDBY only after we're set up completely, so our create > > callback will simply fail early with -ENODEV. This looks fine to me. > > > > This -ENODEV looks strange to me. Which device does not exist? The > userspace were supposed to retry on this? It's not even -EAGAIN. Is it > documented somewhere? -ENODEV looks very reasonable if we consider a device in the NOT_OPER state. > > If it's unavoidable (which I don't see why) I would prefer -EAGAIN. I > think throwing an -ENODEV at our userspace once in a blue moon (if ever) > because that is the way we 'handle' races in our code instead of avoiding > them is not very friendly. > > And I'm not sure -EPERM is not possible (see my statement > about reordering of the writes above). I don't think the actual return code does matter in this case. User space must be prepared for an error (and -ENODEV was even possible before, see the discussion in the v3 thread.) We're dealing with a hard to trigger corner case that is easily handled by user space here: let's not overthink this.