Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning

Cornelia Huck <cohuck@xxxxxxxxxx> · Tue, 5 Nov 2019 10:15:36 +0100

On Mon, 4 Nov 2019 19:38:27 +0100
David Hildenbrand <david@xxxxxxxxxx> wrote:

> On 04.11.19 18:17, Cornelia Huck wrote:
> > On Mon, 4 Nov 2019 15:42:11 +0100
> > David Hildenbrand <david@xxxxxxxxxx> wrote:
> >   
> >> On 04.11.19 15:08, David Hildenbrand wrote:  
> >>> On 04.11.19 14:58, Christian Borntraeger wrote:  

> >>>>> How hard would it be to
> >>>>>
> >>>>> 1. Detect the error condition
> >>>>> 2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure)
> >>>>> 3. Restart the I/O
> >>>>>
> >>>>> I assume that this is a corner case where we don't really have to care about performance in the first shot.  
> >>>>
> >>>> We have looked into this. You would need to implement this in the low level
> >>>> handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want
> >>>> to stop?  
> >>>
> >>> If that's the real fix, we should do that. Maybe one can focus on the
> >>> real use cases first. But I am no I/O expert, so my judgment might be
> >>> completely wrong.
> >>>      
> >>
> >> Oh, and by the way, as discussed you really only have to care about
> >> accesses via "real" I/O devices (IOW, not via the CPU). When accessing
> >> via the CPU, you should have automatic conversion back and forth. As I
> >> am no expert on I/O, I have no idea how iscsi fits into this picture
> >> here (especially on s390x).
> >>  
> > 
> > By "real" I/O devices, you mean things like channel devices, right? (So
> > everything where you basically hand off control to a different kind of
> > processor.)
> > 
> > For classic channel I/O (as used by dasd), I'd expect something like
> > getting a check condition on a ccw if the CU or device cannot access
> > the memory. You will know how far the channel program has progressed,
> > and might be able to restart (from the beginning or from that point).
> > Probably has a chance of working for a subset of channel programs.

NB that there's more than simple reads/writes... could also be control
commands, some of which do read/writes as well.

> > 
> > For QDIO (as used by FCP), I have no idea how this is could work, as we
> > have long-running channel programs there and any error basically kills
> > the queues, which you would have to re-setup from the beginning.
> > 
> > For PCI devices, I have no idea how the instructions even act.
> > 
> >  From my point of view, that error/restart approach looks nice on paper,
> > but it seems hard to make it work in the general case (and I'm unsure
> > if it's possible at all.)  
> 
> One thought: If all we do during an I/O request is read or write (or 
> even a mixture), can we simply restart the whole I/O again, although we 
> did partial reads/writes? This would eliminate the "know how far the 
> channel program has progressed". On error, one would have to touch each 
> involved page (e.g., try to read first byte to trigger a conversion) and 
> restart the I/O. I can understand that this might sound simpler than it 
> is (if it is even possible)

Any control commands might have side effects, though. Problems there
should be uncommon; there's still the _general_ case, though :(

Also, there's stuff like rewriting the channel program w/o prefetch,
jumping with TIC, etc. Linux probably does not do the former, but at
least the dasd driver uses NOP/TIC for error recovery.

> and might still be problematic for QDIO as 
> far as I understand. Just a thought.

Yes, given that for QDIO, establishing the queues is simply one
long-running channel program...