Re: [RFC PATCH v4 0/4] vfio-ccw: Fix interrupt handling for HALT/CLEAR

Eric Farman <farman@xxxxxxxxxxxxx> · Thu, 22 Apr 2021 16:49:21 -0400

On Thu, 2021-04-22 at 02:52 +0200, Halil Pasic wrote:
> On Tue, 13 Apr 2021 20:24:06 +0200
> Eric Farman <farman@xxxxxxxxxxxxx> wrote:
> 
> > Hi Conny, Halil,
> > 
> > Let's restart our discussion about the collision between interrupts
> > for
> > START SUBCHANNEL and HALT/CLEAR SUBCHANNEL. It's been a quarter
> > million
> > minutes (give or take), so here is the problematic scenario again:
> > 
> > 	CPU 1			CPU 2
> >  1	CLEAR SUBCHANNEL
> >  2	fsm_irq()
> >  3				START SUBCHANNEL
> >  4	vfio_ccw_sch_io_todo()
> >  5				fsm_irq()
> >  6				vfio_ccw_sch_io_todo()
> > 
> > From the channel subsystem's point of view the CLEAR SUBCHANNEL
> > (step 1)
> > is complete once step 2 is called, as the Interrupt Response Block
> > (IRB)
> > has been presented and the TEST SUBCHANNEL was driven by the cio
> > layer.
> > Thus, the START SUBCHANNEL (step 3) is submitted [1] and gets a
> > cc=0 to
> > indicate the I/O was accepted. However, step 2 stacks the bulk of
> > the
> > actual work onto a workqueue for when the subchannel lock is NOT
> > held,
> > and is unqueued at step 4. That code misidentifies the data in the
> > IRB
> > as being associated with the newly active I/O, and may release
> > memory
> > that is actively in use by the channel subsystem and/or device.
> > Eww.
> > 
> > In this version...
> > 
> > Patch 1 and 2 are defensive checks. Patch 2 was part of v3 [2], but
> > I
> > would love a better option here to guard between steps 2 and 4.
> > 
> > Patch 3 is a subset of the removal of the CP_PENDING FSM state in
> > v3.
> > I've obviously gone away from this idea, but I thought this piece
> > is
> > still valuable.
> > 
> > Patch 4 collapses the code on the interrupt path so that changes to
> > the FSM state and the channel_program struct are handled at the
> > same
> > point, rather than separated by a mutex boundary. Because of the
> > possibility of a START and HALT/CLEAR running concurrently, it does
> > not make sense to split them here.
> > 
> > With the above patches, maybe it then makes sense to hold the
> > io_mutex
> > across the entirety of vfio_ccw_sch_io_todo(). But I'm not
> > completely
> > sure that would be acceptable.
> > 
> > So... Thoughts?
> 
> I believe we should address

Who is the "we" here?

>  the concurrency, encapsulation and layering
> issues in the subchannel/ccw pass-through code (vfio-ccw) by taking a
> holistic approach as soon as possible.
> 
> I find the current state of art very hard to reason about, and that
> adversely  affects my ability to reason about attempts at partial
> improvements.
> 
> I understand that such a holistic approach needs a lot of work, and
> we
> may have to stop some bleeding first. In the stop the bleeding phase
> we
> can take a pragmatic approach and accept changes that empirically
> seem to
> work towards stopping the bleeding. I.e. if your tests say it's
> better,
> I'm willing to accept that it is better.

So much bleeding!

RE: my tests... I have only been seeing the described problem in
pathological tests, and this series lets those tests run without issue.

> 
> I have to admit, I don't understand how synchronization is done in
> the
> vfio-ccw kernel module (in the sense of avoiding data races).
> 
> Regarding your patches, I have to admit, I have a hard time figuring
> out
> which one of these (or what combination of them) is supposed to solve
> the problem you described above. If I had to guess, I would guess it
> is
> either patch 4, because it has a similar scenario diagram in the
> commit message like the one in the problem statement. Is my guess
> right?

Sort of. It is true that Patch 4 is the last piece of the puzzle, and
the diagram is included in that commit message so it is kept with the
change, instead of being lost with the cover letter.

As I said in the cover letter, "Patch 1 and 2 are defensive checks"
which are simply included to provide a more robust solution. You could
argue that Patch 3 should be held out separately, but as it came from
the previous version of this series it made sense to include here.

> 
> If it is right I don't quite understand the mechanics of the fix,
> because what the patch seems to do is changing the content of step 4
> in
> the above diagram. And I don't see how is change that code
> so that it does not "misidentifies the data in the IRB as being
> associated with the newly active I/O". 

Consider that the cp_update_scsw() and cp_free() routines that get
called here are looking at the cp->initialized flag to determine
whether to perform any work. For a system that is otherwise idle, the
cp->initialized flag will be false when processing an IRB related to a
CSCH, meaning the bulk of this routine will be a NOP.

In the failing scenario, as I describe in the commit message for patch
4, we could be processing an interrupt that is unaffiliated with the CP
that was (or is being) built. It need not even be a solicited
interrupt; it just happened that the CSCH interrupt is what got me
looking at this path. The whole situation boils down to the FSM state
and cp->initialized flag being out of sync from one another after
coming through this function.

> Moreover patch 4 seems to rely on
> private->state which, AFAIR is still used in a racy fashion.
> 
> But if strong empirical evidence shows that it performs better (stops
> the bleeding), I think we can go ahead with it.

Again with the bleeding. Is there a Doctor in the house? :)

Eric

> 
> Regards,
> Halil
> 
> 
> 
> 
> 
> 
>