Re: [RFC 09/37] KVM: s390: protvirt: Implement on-demand pinning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04.11.19 18:17, Cornelia Huck wrote:
On Mon, 4 Nov 2019 15:42:11 +0100
David Hildenbrand <david@xxxxxxxxxx> wrote:

On 04.11.19 15:08, David Hildenbrand wrote:
On 04.11.19 14:58, Christian Borntraeger wrote:


On 04.11.19 11:19, David Hildenbrand wrote:
to synchronize page import/export with the I/O for paging. For example you can actually
fault in a page that is currently under paging I/O. What do you do? import (so that the
guest can run) or export (so that the I/O will work). As this turned out to be harder then
we though we decided to defer paging to a later point in time.

I don't quite see the issue yet. If you page out, the page will
automatically (on access) be converted to !secure/encrypted memory. If
the UV/guest wants to access it, it will be automatically converted to
secure/unencrypted memory. If you have concurrent access, it will be
converted back and forth until one party is done.

IO does not trigger an export on an imported page, but an error
condition in the IO subsystem. The page code does not read pages through

Ah, that makes it much clearer. Thanks!
the cpu, but often just asks the device to read directly and that's
where everything goes wrong. We could bounce swapping, but chose to pin
for now until we find a proper solution to that problem which nicely
integrates into linux.

How hard would it be to

1. Detect the error condition
2. Try a read on the affected page from the CPU (will will automatically convert to encrypted/!secure)
3. Restart the I/O

I assume that this is a corner case where we don't really have to care about performance in the first shot.

We have looked into this. You would need to implement this in the low level
handler for every I/O. DASD, FCP, PCI based NVME, iscsi. Where do you want
to stop?

If that's the real fix, we should do that. Maybe one can focus on the
real use cases first. But I am no I/O expert, so my judgment might be
completely wrong.

Oh, and by the way, as discussed you really only have to care about
accesses via "real" I/O devices (IOW, not via the CPU). When accessing
via the CPU, you should have automatic conversion back and forth. As I
am no expert on I/O, I have no idea how iscsi fits into this picture
here (especially on s390x).


By "real" I/O devices, you mean things like channel devices, right? (So
everything where you basically hand off control to a different kind of
processor.)

Exactly.


For classic channel I/O (as used by dasd), I'd expect something like
getting a check condition on a ccw if the CU or device cannot access
the memory. You will know how far the channel program has progressed,
and might be able to restart (from the beginning or from that point).
Probably has a chance of working for a subset of channel programs.

Yeah, sound sane to me.


For QDIO (as used by FCP), I have no idea how this is could work, as we
have long-running channel programs there and any error basically kills
the queues, which you would have to re-setup from the beginning.

For PCI devices, I have no idea how the instructions even act.

 From my point of view, that error/restart approach looks nice on paper,
but it seems hard to make it work in the general case (and I'm unsure
if it's possible at all.)

Then I'm afraid whoever designed protected virtualization didn't properly consider concurrent I/O access to encrypted pages. It might not be easy to sort out, though, so I understand why the I/O part was designed that way :)

I was wondering if one could implement some kind of automatic conversion "back and forth" on I/O access (or even on any access within the HW). I mean, "basically" it's just encrypting/decrypting the page and updating the state by the UV (+ synchronization, lol). But yeah, the UV is involved, and would be triggered somehow via I/O access to these pages. Right now that conversion is performed via exceptions by the OS explicitly. Instead of passing exceptions, the UV could convert automatically. Smells like massive HW changes, if possible and desired at all.

I do wonder what would happen if you back your guest memory not on anonymous memory but on e.g., a file. Could be that this eliminates all options besides pinning and fixing I/O, because we're talking about writeback and not paging.

HOWEVER, reading https://lwn.net/Articles/787636/

"Kara talked mostly about the writeback code; in some cases, it will simply skip pages that are pinned. But there are cases where those pages must be written out — "somebody has called fsync(), and they expect something to be saved". In this case, pinned pages will be written, but they will not be marked clean at the end of the operation; they will still be write-protected in the page tables while writeback is underway, though."

So, sounds like you will get concurrent I/O access even without paging ... and that would leave fixing I/O the only option with the current HW design AFAIKS :/

--

Thanks,

David / dhildenb





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux