On Mon, 13 Apr 2020 13:22:24 -0700 Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > On 3/6/20 5:25 AM, Claudio Imbrenda wrote: > > On s390x the function is not supposed to fail, so it is ok to use a > > WARN_ON on failure. If we ever need some more finegrained handling > > we can tackle this when we know the details. > > Could you explain a bit why the function can't fail? the concept of "making accessible" is only to make sure that accessing the page will not trigger faults or I/O or DMA errors. in general it does not mean freely accessing the content of the page in cleartext. on s390x, protected guest pages can be shared. the guest has to actively share its pages, and in that case those pages are both part of the protected VM and freely accessible by the host. pages that are not shared cannot be accessed by the host. in our case "making the page accessible" means: - if the page was shared, make sure it stays shared - if the page was not shared, first encrypt it and then make it accessible to the host (both operations performed securely and atomically by the hardware) then the page can be swapped out, or used for direct I/O (obviously if you do I/O on a page that was not shared, you cannot expect good things to happen, since you basically corrupt the memory of the guest). on s390x performing I/O directly on protected pages results in (in practice) unrecoverable I/O errors, so we want to avoid it at all costs. accessing protected pages from the CPU triggers an exception that can be handled (and we do handle it, in fact) now imagine a buggy or malicious qemu process crashing the whole machine just because it did I/O to/from a protected page. we clearly don't want that. > If the guest has secret data in the page, then it *can* and does fail. no, that's the whole point of this mechanism. in fact, most of the guest pages will be "secret data", only the few pages used for guest I/O bounce buffers will be shared with the host > It won't fail, though, if the host and guest agree on whether the page > is protected. > > Right? > > > @@ -2807,6 +2807,13 @@ int __test_set_page_writeback(struct page > > *page, bool keep_write) inc_zone_page_state(page, > > NR_ZONE_WRITE_PENDING); } > > unlock_page_memcg(page); > > + access_ret = arch_make_page_accessible(page); > > + /* > > + * If writeback has been triggered on a page that cannot > > be made > > + * accessible, it is too late to recover here. > > + */ > > + VM_BUG_ON_PAGE(access_ret != 0, page); > > + > > return ret; > > > > } > > This seems like a really odd place to do this. Writeback is specific > to block I/O. I would have thought there were other kinds of devices > that matter, not just block devices. well, yes and no. for writeback (block I/O and swap) this is the right place. at this point we know that the page is present and nobody else has started doing I/O yet, and I/O will happen soon-ish. so we make the page accessible. there is no turning back here, unlike pinning. we are not allowed to fail, we can't regarding the other kinds of devices: yes, they will use pinning, which is covered by the rest of the patch. the semantics of get page and pin page (if the documentation has not changed meanwhile) is that the traditional get_page is used for when the page is needed but not its content, and pin_page is used when the content of the page is accessed. since the only issue here is accessing the content of the page, we don't need to make it accessible for get_page, but only for pin_page. get_page and pin_page are allowed to fail, so in this case we return an error code, so other architectures can potentially abort the pinning if needed. on s390x we will never fail, for the same reasons written above. > Also, this patch seems odd that it only does the > arch_make_page_accessible() half. Where's the other half where the > page is made inaccessible? that is very arch-specific. for s390x, you can look at this patch and the ones immediately before/after: 214d9bbcd3a67230b932f6ce > I assume it's OK to "leak" things like this, it's just not clear to me > _why_ it's OK. nothing is being leaked :) I hope I clarified a little how this works on s390x :) feel free to poke me again if some things are still unclear best regards, Claudio Imbrenda