On 11/4/19 11:19 AM, David Hildenbrand wrote: >>>> to synchronize page import/export with the I/O for paging. For example you can actually >>>> fault in a page that is currently under paging I/O. What do you do? import (so that the >>>> guest can run) or export (so that the I/O will work). As this turned out to be harder then >>>> we though we decided to defer paging to a later point in time. >>> >>> I don't quite see the issue yet. If you page out, the page will >>> automatically (on access) be converted to !secure/encrypted memory. If >>> the UV/guest wants to access it, it will be automatically converted to >>> secure/unencrypted memory. If you have concurrent access, it will be >>> converted back and forth until one party is done. >> >> IO does not trigger an export on an imported page, but an error >> condition in the IO subsystem. The page code does not read pages through > > Ah, that makes it much clearer. Thanks! > >> the cpu, but often just asks the device to read directly and that's >> where everything goes wrong. We could bounce swapping, but chose to pin >> for now until we find a proper solution to that problem which nicely >> integrates into linux. > > How hard would it be to > > 1. Detect the error condition > 2. Try a read on the affected page from the CPU (will will automatically > convert to encrypted/!secure) > 3. Restart the I/O > > I assume that this is a corner case where we don't really have to care > about performance in the first shot. Restarting IO can be quite difficult with CCW, we might need to change request data... > >> >>> >>> A proper automatic conversion should make this work. What am I missing? >>> >>>> >>>> As we do not want to rely on the userspace to do the mlock this is now done in the kernel. >>> >>> I wonder if we could come up with an alternative (similar to how we >>> override VM_MERGEABLE in the kernel) that can be called and ensured in >>> the kernel. E.g., marking whole VMAs as "don't page" (I remember >>> something like "special VMAs" like used for VDSOs that achieve exactly >>> that, but I am absolutely no expert on that). That would be much nicer >>> than pinning all pages and remembering what you pinned in huge page >>> arrays ... >> >> It might be more worthwhile to just accept one or two releases with >> pinning and fix the root of the problem than design a nice stopgap. > > Quite honestly, to me this feels like a prototype hack that deserves a > proper solution first. The issue with this hack is that it affects user > space (esp. MADV_DONTNEED no longer working correctly). It's not just > something you once fix in the kernel and be done with it. It is a hack, yes. But we're not the only architecture to need it x86 pins all the memory at the start of the VM and that code is already upstream... >> >> Btw. s390 is not alone with the problem and we'll try to have another >> discussion tomorrow with AMD to find a solution which works for more >> than one architecture. > > Let me know if there was an interesting outcome. >
Attachment:
signature.asc
Description: OpenPGP digital signature