Re: [PATCH RFC 2/2] vfio-ccw: support for halt/clear subchannel

Pierre Morel <pmorel@xxxxxxxxxxxxx> · Tue, 12 Jun 2018 15:56:06 +0200

On 12/06/2018 11:59, Cornelia Huck wrote:
On Fri, 8 Jun 2018 17:51:27 +0200
Pierre Morel <pmorel@xxxxxxxxxxxxx> wrote:

On 08/06/2018 16:45, Cornelia Huck wrote:
On Fri, 8 Jun 2018 15:13:28 +0200
Halil Pasic <pasic@xxxxxxxxxxxxx> wrote:

On 06/08/2018 02:20 PM, Cornelia Huck wrote:
My proposal is to do the same
copying to scsw(r) again, which would mean we get a request with both
the halt and the start bit set. The vfio code now needs to do a hsch
(instead of a ssch). The real channel subsystem should figure this out,
as we can't reliably check whether the start function has concluded
already (there's always a race window).
This I do not agree scsw(r) is part of the driver.
The interface here is not a device interface anymore but a driver
interface.
SCSW is a status, it is at its place in QEMU device interface with the
guest
but here pwrite() sends a command.
Hm, I rather consider that "we write a status, and the backend figures
out what to do based on that status".

The status of what? Kind of a target status?

I think this approach is the source of lots of complications. For instance
take xsch. How are we supposed to react to a guest xsch (in QEMU and
in the kernel module)? My guess is that the right thing to do is to issue
an xsch in the vfio-ccw kernel module on the passed through subchannel.
But there is no bit in fctl for cancel.

Bottom line is: I'm not happy with the current design but I'm not sure
if it's practical to do something about it (i.e. change it radically).
It might make sense to keep this for ssch, maybe reuse it for hsch/csch,
I do not think we need to change the interface radically but
I also do not thing we should extend it by using multiple commands
in a single syscall.

Currently:
    - only SSCH bit is used
    - only the SSCH instruction is implemented
    - all other bits, CSCH,HSCH produce an error when used alone
      or are ignored in conjonction with SSCH
     - there is no implementation using the other bits
     - It is not specified in the documentation that multiple commands
       can be used.
Looking at Documentation/s390/vfio-ccw.txt, it states that "scsw_area
should be filled with the SCSW of the Virtual Subchannel". This seems
to indicate that this is really intended to be a scsw... but I agree,
it does lack details.

Looking at these, I think there is no trouble to modify the way
the Kernel interface is implemented without impact on current QEMU.

But if we begin to allow ssch/hsch/csch in a single command
in a new implementation we will be stuck with it.
Yes, we're currently still free to go in different directions; adding
support for hsch/csch to the interface in the way I did would fix it.
In any case, we need better documentation :/

and think about something else for other things we want to handle
Yes we will need to have another interface, ioctl, or new region,
all possible, but really more complex.

(xsch, channel monitoring, the path handling stuff for which we already
We can use another region for getting up information on path handling
or monitoring, as does the patch IIRC.
This is not a problem.
Not a problem, I agree (and yes, the patch did that).

For xsch, I like Halil's suggestion of simply always setting cc 2.

Channel monitoring is a difficult beast (need to pass through msch, mix
of virtual and passthrough devices on the machine which have different
semantics etc.) I put some of my concerns into
https://wiki.qemu.org/ToDo/Channel_I/O_Passthrough; please add to that
if you have further thoughts.

We should keep those requirements in mind, even if we won't implement
support for it right now.

had a prototype etc.) It's probably not practical to do radical surgery
on the existing code.
There is no need for radical surgery, no change is required to older or
current QEMU code.
My concern is to avoid a future implementation merging multiple commands
in a single syscall.
It is not only a problem of beauty of the interface,
using a status is for the up-stream, from device to program.
Using the same construct, same name and same location, to produce commands
for the down stream is misleading and source of incoherence.
OK, I think I see your concern now.

What happens on the real hardware is that we do a ssch etc. and this
triggers a change that is visible in the scsw when we do a stsch.

What happens here is that the guest doing a ssch triggers a change in
our virtual scsw in QEMU (so far, so good); then we send this scsw to
the vfio module, which looks at the scsw to figure out that it should
do a ssch on the host. This works fine to figure out that a ssch needs
to be done, and would also work for hsch and csch, but it is really a
bit of reverse engineering, and it would probably fail for rsch
(haven't thought about that yet). To parse the intention of doing a
halt or clear out of the scsw_area, we need to rely (a) on user space
doing the right thing, and (b) on us implementing the rules for which
function can be initiated when correctly. If we treat fctl as a simple
command field, we'll just do what user space asks of us directly.

So, what are you proposing? Being more specific and stating that the
scsw is not necessarily a real scsw, but merely a vehicle for sending a
command? Or keeping it as it is now for ssch, and adding a second
interface for hsch/csch (and maybe rsch, msch, ...)?

I said no radical surgery, but after thinking more about it...
I am not sure.

Let's explain this:

I see 2 ways to proceed but my favorite is definitively to introduce 
versioning.

Way 1)

This was the way I first thought about.
We keep the existing IO Regionand structures, and are more
specific by stating that the io_region is a command region during write
entry and a status region during interrupt handling:
This allow us to use the 3 bits of the FCTL field of the SCSW to pass
commands to the kernel and stay backward compatible.
The FCTL field has 3 bits => we can have 8 commands.

PRO: small change

CONTRA: This is still confusing, we do not really solve this
        unclarity problem since QEMU view / documentation and
        Linux view / documentation differ or we update QEMU.
        Moreover this does not allow for long term extensions
        and/or re-design.

Way 2)

We use the device VFIO versioning using the capability chain to advertise
a new interface.

This the preferred way, it is sane, allows for the userland backward
compatibility and allows to have a new command interface, extensible
for future use.

In this case we can extend the interface to be any kind we want
in a next version, pwrite with new io_region, mmap on new
IO regions, status region...

PRO: Extensible and also goes in the VFIO INFO extension direction
     proposed by Alex

CONTRA: I see none outer more work to do (but is it a problem?)

====================

Extra argumentation for versioning support

Suppose a future implementation with 4 mapped regions like
the following:

- A Status region (RO updated as result of command and IRQ)

- A command region (WO where the user send its commands)
  userland write here to trigger IO (quite as currently)

- A CCW program region (RW where the CCW chain is handled)
  most handling done from userspace, last translations in kernel,
  double buffering...

- A performance / measurement / statistics region (RO)
  This is updated asynchronously by hardware and/or driver

This is purely theoretical and we do not need to do all at once
but if we want to extend the implementation without problems
and continue backward compatibility the versioning and capability
handling is a must.

====================

Regards,

Pierre

--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany