On Fri, 2021-04-23 at 15:23 +0200, Halil Pasic wrote: > On Fri, 23 Apr 2021 13:06:16 +0200 > Cornelia Huck <cohuck@xxxxxxxxxx> wrote: > > > On Thu, 22 Apr 2021 16:49:21 -0400 > > Eric Farman <farman@xxxxxxxxxxxxx> wrote: > > > > > On Thu, 2021-04-22 at 02:52 +0200, Halil Pasic wrote: > > > > On Tue, 13 Apr 2021 20:24:06 +0200 > > > > Eric Farman <farman@xxxxxxxxxxxxx> wrote: > > > > > > > > > Hi Conny, Halil, > > > > > > > > > > Let's restart our discussion about the collision between > > > > > interrupts > > > > > for > > > > > START SUBCHANNEL and HALT/CLEAR SUBCHANNEL. It's been a > > > > > quarter > > > > > million > > > > > minutes (give or take), so here is the problematic scenario > > > > > again: > > > > > > > > > > CPU 1 CPU 2 > > > > > 1 CLEAR SUBCHANNEL > > > > > 2 fsm_irq() > > > > > 3 START SUBCHANNEL > > > > > 4 vfio_ccw_sch_io_todo() > > > > > 5 fsm_irq() > > > > > 6 vfio_ccw_sch_io_todo() > > > > > > > > > > From the channel subsystem's point of view the CLEAR > > > > > SUBCHANNEL > > > > > (step 1) > > > > > is complete once step 2 is called, as the Interrupt Response > > > > > Block > > > > > (IRB) > > > > > has been presented and the TEST SUBCHANNEL was driven by the > > > > > cio > > > > > layer. > > > > > Thus, the START SUBCHANNEL (step 3) is submitted [1] and gets > > > > > a > > > > > cc=0 to > > > > > indicate the I/O was accepted. However, step 2 stacks the > > > > > bulk of > > > > > the > > > > > actual work onto a workqueue for when the subchannel lock is > > > > > NOT > > > > > held, > > > > > and is unqueued at step 4. That code misidentifies the data > > > > > in the > > > > > IRB > > > > > as being associated with the newly active I/O, and may > > > > > release > > > > > memory > > > > > that is actively in use by the channel subsystem and/or > > > > > device. > > > > > Eww. > > > > > > > > > > In this version... > > > > > > > > > > Patch 1 and 2 are defensive checks. Patch 2 was part of v3 > > > > > [2], but > > > > > I > > > > > would love a better option here to guard between steps 2 and > > > > > 4. > > > > > > > > > > Patch 3 is a subset of the removal of the CP_PENDING FSM > > > > > state in > > > > > v3. > > > > > I've obviously gone away from this idea, but I thought this > > > > > piece > > > > > is > > > > > still valuable. > > > > > > > > > > Patch 4 collapses the code on the interrupt path so that > > > > > changes to > > > > > the FSM state and the channel_program struct are handled at > > > > > the > > > > > same > > > > > point, rather than separated by a mutex boundary. Because of > > > > > the > > > > > possibility of a START and HALT/CLEAR running concurrently, > > > > > it does > > > > > not make sense to split them here. > > > > > > > > > > With the above patches, maybe it then makes sense to hold the > > > > > io_mutex > > > > > across the entirety of vfio_ccw_sch_io_todo(). But I'm not > > > > > completely > > > > > sure that would be acceptable. > > > > > > > > > > So... Thoughts? > > > > > > > > I believe we should address > > > > > > Who is the "we" here? > > > > > > > the concurrency, encapsulation and layering > > > > issues in the subchannel/ccw pass-through code (vfio-ccw) by > > > > taking a > > > > holistic approach as soon as possible. > > > > Let me also ask: what is "holistic"? If that's a complete rewrite, > > I > > definitely don't have the capacity for that; if others want to take > > over the code, feel free. > > Ditto. The idea of a rewrite has come up in the past, and I still don't see how that's a good use of time/resources. Looking at the fixes and improvements from the last couple of years, I feel good about the current components, their design, and their handshaking. > > In general: https://en.wikipedia.org/wiki/Holism > > In this context I mean: > * Fix all data races in in the vfio-ccw module instead of making the > "race window" smaller. Reasoning about the behavior of racy programs > is very difficult. > * The passed-through subchannel of the VM, as seen by the guest OS is > an > overlay of the host subchannel (which we have to assume is within the > specification), the vfio-ccw kernel module, and an userspace > emulator. > The interface between the kernel module and the userspace emulator is > something the authors of the vfio-ccw kernel module design, and while > doing so we have to think about the interface the whole solution > needs > to implemnet. For example we should ask ourselves: what is the right > response in kernel when we encounter the situation described by the > steps 1-3 of Eric's scenario. Our VMs subchannel needs to reward the > SSC with a cc=2 if the subchannel has the clear FC bit set. If we > detect > the described condition, does it mean that the userspace emulator is > broken? Or is the userspace emulator allowed to rely on the vfio-ccw > kernel module to detect this condition and return an -EBUSY (which > corresponds to cc=2 because that is apart of the definition of the > interface between the kernel and the userspace)? When is the FC bit > of our VMs subchannel cleared? > I read patch 2 like it is trying to catch > the condition and return an -EBUSY, but I don't see it catching all > the possible cases. I.e. what if another CPU is executing the first > instruction of vfio_ccw_sch_io_todo() when we check > work_pending(&private->io_work) in fsm_io_helper()? This last hunk is part of the concern that Conny raised in reply to Patch 2, which we have an idea on how to pursue. Eric > > [..] > > > > > > > > Moreover patch 4 seems to rely on > > > > private->state which, AFAIR is still used in a racy fashion. > > > > > > > > But if strong empirical evidence shows that it performs better > > > > (stops > > > > the bleeding), I think we can go ahead with it. > > > > > > Again with the bleeding. Is there a Doctor in the house? :) > > > > No idea, seen any blue boxes around? :) > > > > Let me also ask what: blue boxes do you mean? If you mean > https://en.wikipedia.org/wiki/Blue_box > then, I'm not sure I can follow your association. Are you looking for > phone to call a doctor? > > Regards, > Halil