On Mon, 4 Nov 2019 19:38:27 +0100 David Hildenbrand <david@xxxxxxxxxx> wrote: > On 04.11.19 18:17, Cornelia Huck wrote: > > On Mon, 4 Nov 2019 15:42:11 +0100 > > David Hildenbrand <david@xxxxxxxxxx> wrote: > > > >> On 04.11.19 15:08, David Hildenbrand wrote: > >>> On 04.11.19 14:58, Christian Borntraeger wrote: > >>>>> How hard would it be to > >>>>> > >>>>> 1. Detect the error condition > >>>>> 2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure) > >>>>> 3. Restart the I/O > >>>>> > >>>>> I assume that this is a corner case where we don't really have to care about performance in the first shot. > >>>> > >>>> We have looked into this. You would need to implement this in the low level > >>>> handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want > >>>> to stop? > >>> > >>> If that's the real fix, we should do that. Maybe one can focus on the > >>> real use cases first. But I am no I/O expert, so my judgment might be > >>> completely wrong. > >>> > >> > >> Oh, and by the way, as discussed you really only have to care about > >> accesses via "real" I/O devices (IOW, not via the CPU). When accessing > >> via the CPU, you should have automatic conversion back and forth. As I > >> am no expert on I/O, I have no idea how iscsi fits into this picture > >> here (especially on s390x). > >> > > > > By "real" I/O devices, you mean things like channel devices, right? (So > > everything where you basically hand off control to a different kind of > > processor.) > > > > For classic channel I/O (as used by dasd), I'd expect something like > > getting a check condition on a ccw if the CU or device cannot access > > the memory. You will know how far the channel program has progressed, > > and might be able to restart (from the beginning or from that point). > > Probably has a chance of working for a subset of channel programs. NB that there's more than simple reads/writes... could also be control commands, some of which do read/writes as well. > > > > For QDIO (as used by FCP), I have no idea how this is could work, as we > > have long-running channel programs there and any error basically kills > > the queues, which you would have to re-setup from the beginning. > > > > For PCI devices, I have no idea how the instructions even act. > > > > From my point of view, that error/restart approach looks nice on paper, > > but it seems hard to make it work in the general case (and I'm unsure > > if it's possible at all.) > > One thought: If all we do during an I/O request is read or write (or > even a mixture), can we simply restart the whole I/O again, although we > did partial reads/writes? This would eliminate the "know how far the > channel program has progressed". On error, one would have to touch each > involved page (e.g., try to read first byte to trigger a conversion) and > restart the I/O. I can understand that this might sound simpler than it > is (if it is even possible) Any control commands might have side effects, though. Problems there should be uncommon; there's still the _general_ case, though :( Also, there's stuff like rewriting the channel program w/o prefetch, jumping with TIC, etc. Linux probably does not do the former, but at least the dasd driver uses NOP/TIC for error recovery. > and might still be problematic for QDIO as > far as I understand. Just a thought. Yes, given that for QDIO, establishing the queues is simply one long-running channel program...