On 06/07/2018 11:54 AM, Cornelia Huck wrote:
Hm, I think we need to be more precise as to what scsw we're talking about. Bad ascii art time: -------------- | scsw(g) | ssch -------------- | | guest -------------------------------------------------------------- | qemu -------------- v | scsw(q) | emulate -------------- | | -------------- v | scsw(r) | pwrite() -------------- | | -------------------------------------------------------------- | vfio v ssch | -------------------------------------------------------------- | hardware -------------- v | scsw(h) | actually do something -------------- The guest issues a ssch (which gets intercepted; it won't get control back until ssch finishes with a cc set.) scsw(g) won't change, unless the guest does a stsch for the subchannel on another vcpu, in which case it will get whatever information qemu holds in scsw(q) at that point in time.
(1) I think BQL make other cpu or not other kind of the same. We will effectively start processing the stsch in QEMU after we are done with the ssch in QEMU.
When qemu starts to emulate the guest's ssch, it will set the start function bit in the fctl field of scsw(q). It then copies scsw(q) to scsw(r) in the vfio region.
(2) This is architecturally wrong AFAIK. The fctl bit is supposed to be set on cc 0. But because of (1) this might not be a observable by the guest -- we can fix it up. (3)IMHO scsw(r) is not a real scsw as defined by the architecture but a strange communication structure (not) defined vfio-ccw.
The vfio code will then proceed to call ssch on the real subchannel. This is the first time we get really asynchronous, as the ssch will return with cc set and the start function will be performed at some point in time. If we would do a stsch on the real subchannel, we would see that scsw(h) now has the start function bit set.
(4) I guess only if cc 0.
Currently, we won't return back up the chain until we get an interrupt from the hardware, at which time we update the scsw(r) from the irb. This will propagate into the scsw(q). At the time we finish handling the guest's ssch and return control to it, we're all done and if the guest does a stsch to update its scsw(g), it will get the current scsw(q) which will already contain the scsw from the interrupt's irb (indicating that the start function is already finished). Now let's imagine we have a future implementation that handles actually performing the start on the hardware asynchronously, i.e. it returns control to the guest without the interrupt having been posted (let's say that it is a longer-running I/O request). If the guest now did a stsch to update scsw(g), it would get the current state of scsw(q), which would be "start function set, but not done yet".
(5) AFAIK this is how the current implementation works. We don't wait for the I/O interrupt on the host to present a cc to the guest for it's ssch.
If the guest now does a hsch, it would trap in the same way as the ssch before. When qemu gets control, it adds the halt bit in scsw(q) (which is in accordance with the architecture).
(7) Again it's when is fctl set according to the architecture...
My proposal is to do the same copying to scsw(r) again, which would mean we get a request with both the halt and the start bit set.
(8) IMHO when receiving the 'request' we are and should be in instruction context -- opposed to basic io function context. So we should not set fctl before we know what will our guest cc be. But since scsw(r) is not a real scsw it is just strange.
The vfio code now needs to do a hsch (instead of a ssch). The real channel subsystem should figure this out, as we can't reliably check whether the start function has concluded already (there's always a race window).
(9) Yes we can't tell for sure if the start function is still being performed by the stuff below. Regards, Halil
For csch, things are a bit different (which the code posted here did not take into account). The qemu emulation of csch needs to clear any start/halt bits in scsw(q) when setting the clear bit there, and therefore scsw(r) will only have the clear bit set in that case. We still should do an unconditional csch for the same reasons as above; the hardware will do the same things (clearing start/halt, setting clear) in the scsw(h). Congratulations, you've reached the end:) I hope that was helpful and not too confusing.
-- To unsubscribe from this list: send the line "unsubscribe linux-s390" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html