Re: [PATCH v2 2/5] vfio-ccw: concurrent I/O handling

Halil Pasic <pasic@xxxxxxxxxxxxx> · Fri, 25 Jan 2019 15:01:01 +0100

On Fri, 25 Jan 2019 13:58:35 +0100
Cornelia Huck <cohuck@xxxxxxxxxx> wrote:

> On Fri, 25 Jan 2019 11:24:37 +0100
> Cornelia Huck <cohuck@xxxxxxxxxx> wrote:
> 
> > On Thu, 24 Jan 2019 21:37:44 -0500
> > Eric Farman <farman@xxxxxxxxxxxxx> wrote:
> > 
> > > On 01/24/2019 09:25 PM, Eric Farman wrote:  
> > > > 
> > > > 
> > > > On 01/21/2019 06:03 AM, Cornelia Huck wrote:    
> > 
> > > > [1] I think these changes are cool.  We end up going into (and staying 
> > > > in) state=BUSY if we get cc=0 on the SSCH, rather than in/out as we 
> > > > bumble along.
> > > > 
> > > > But why can't these be separated out from this patch?  It does change 
> > > > the behavior of the state machine, and seem distinct from the addition 
> > > > of the mutex you otherwise add here?  At the very least, this behavior 
> > > > change should be documented in the commit since it's otherwise lost in 
> > > > the mutex/EAGAIN stuff.  
> > 
> > That's a very good idea. I'll factor them out into a separate patch.
> 
> And now that I've factored it out, I noticed some more problems.
> 
> What we basically need is the following, I think:
> 
> - The code should not be interrupted while we process the channel
>   program, do the ssch etc. We want the caller to try again later (i.e.
>   return -EAGAIN)

We could also interrupt it e.g. by a TRANSLATE -> REQ_ABORT_TRANSLATE
state transition. The thread doing the translation could pick that up
and make sure we don't do the ssch(). Would match the architecture
better, but would be more complicated. And can be done any time later.

> - We currently do not want the user space to submit another channel
>   program while the first one is still in flight. As submitting another
>   one is a valid request, however, we should allow this in the future
>   (once we have the code to handle that in place).

I don't agree. There is at most one channel program processed by a
subchannel at any time. I would prefer an early error code if channel
programs are issued on top of each other (our virtual subchannel
is channel pending or FC start function bit set or similar).

Of course the interface exposed by the vfio-ccw module does not need to
look like the architecture interface. But IMHO any
unjustified deviation from the good old architecure ways of things will
just make it harder to reason about stuff. In the end you have that
interface both as input (guests passed-thorough subchannel) and as
output (the subchannel in the host that's being passed-thorough).

> - With the async interface, we want user space to be able to submit a
>   halt/clear while a start request is still in flight, but not while
>   we're processing a start request with translation etc. We probably
>   want to do -EAGAIN in that case.

This reads very similar to your first point.

> 
> My idea would be:
> 
> - The BUSY state denotes "I'm busy processing a request right now, try
>   again". We hold it while processing the cp and doing the ssch and
>   leave it afterwards (i.e., while the start request is processed by
>   the hardware). I/O requests and async requests get -EAGAIN in that
>   state.
> - A new state (CP_PENDING?) is entered after ssch returned with cc 0
>   (from the BUSY state). We stay in there as long as no final state for
>   that request has been received and delivered. (This may be final
>   interrupt for that request, a deferred cc, or successful halt/clear.)
>   I/O requests get -EBUSY, async requests are processed. This state can
>   be removed again once we are able to handle more than one outstanding
>   cp.
> 
> Does that make sense?
> 

AFAIU your idea is to split up the busy state into two states: CP_PENDING
and of busy without CP_PENDING called BUSY. I like the idea of having a
separate state for CP_PENDING but I don't like the new semantic of BUSY.

Hm mashing a conceptual state machine and the jumptabe stuff ain't
making reasoning about this simpler either. I'm taking about the
conceptual state machine. It would be nice to have a picture of it and
then think about how to express that in code.

Regards,
Halil