Re: [PATCH Part2 v6 14/49] crypto: ccp: Handle the legacy TMR allocation when SNP is enabled

"Kalra, Ashish" <ashish.kalra@xxxxxxx> · Wed, 16 Nov 2022 12:01:11 -0600

On 11/16/2022 4:25 AM, Vlastimil Babka wrote:
On 11/16/22 11:19, Kalra, Ashish wrote:
On 11/16/2022 3:08 AM, Vlastimil Babka wrote:
On 11/15/22 19:15, Kalra, Ashish wrote:

On 11/15/2022 11:24 AM, Kalra, Ashish wrote:
Hello Vlastimil,

On 11/15/2022 9:14 AM, Vlastimil Babka wrote:
Cc'ing memory failure folks, the beinning of this subthread is here:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F3a51840f6a80c87b39632dc728dbd9b5dd444cd7.1655761627.git.ashish.kalra%40amd.com%2F&amp;data=05%7C01%7Cashish.kalra%40amd.com%7C174b7caaf99a473194cd08dac7bcebf3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638041911481429347%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=CFkAXNQqangvCqhnwDyIUJUkfiUrX67OpKDTtLGj6PU%3D&amp;reserved=0

On 11/15/22 00:36, Kalra, Ashish wrote:
Hello Boris,

On 11/2/2022 6:22 AM, Borislav Petkov wrote:
On Mon, Oct 31, 2022 at 04:58:38PM -0500, Kalra, Ashish wrote:
         if (snp_lookup_rmpentry(pfn, &rmp_level)) {
                do_sigbus(regs, error_code, address, VM_FAULT_SIGBUS);
                return RMP_PF_RETRY;

Does this issue some halfway understandable error message why the
process got killed?

Will look at adding our own recovery function for the same, but that
will
again mark the pages as poisoned, right ?

Well, not poisoned but PG_offlimits or whatever the mm folks agree upon.
Semantically, it'll be handled the same way, ofc.

Added a new PG_offlimits flag and a simple corresponding handler for it.

One thing is, there's not enough page flags to be adding more (except
aliases for existing) for cases that can avoid it, but as Boris says, if
using alias to PG_hwpoison it depends what will become confused with the
actual hwpoison.

But there is still added complexity of handling hugepages as part of
reclamation failures (both HugeTLB and transparent hugepages) and that
means calling more static functions in mm/memory_failure.c

There is probably a more appropriate handler in mm/memory-failure.c:

soft_offline_page() - this will mark the page as HWPoisoned and also has
handling for hugepages. And we can avoid adding a new page flag too.

soft_offline_page - Soft offline a page.
Soft offline a page, by migration or invalidation, without killing
anything.

So, this looks like a good option to call
soft_offline_page() instead of memory_failure() in case of
failure to transition the page back to HV/shared state via
SNP_RECLAIM_CMD
and/or RMPUPDATE instruction.

So it's a bit unclear to me what exact situation we are handling here. The
original patch here seems to me to be just leaking back pages that are
unsafe for further use. soft_offline_page() seems to fit that scenario
of a
graceful leak before something is irrepairably corrupt and we page fault
on it.
But then in the thread you discus PF handling and killing. So what is the
case here? If we detect this need to call snp_leak_pages() does it mean:

a) nobody that could page fault at them (the guest?) is running
anymore, we
are tearing it down, we just can't reuse the pages further on the host

The host can page fault on them, if anything on the host tries to write to
these pages. Host reads will return garbage data.

- seem like soft_offline_page() could work, but maybe we could just put
the
pages on some leaked lists without special page? The only thing that
should
matter is not to free the pages to the page allocator so they would be
reused by something else.

b) something can stil page fault at them (what?) - AFAIU can't be resolved
without killing something, memory_failure() might limit the damage

As i mentioned above, host writes will cause RMP violation page fault.

And to add here, if its a guest private page, then the above fault cannot be
resolved, so the faulting process is terminated.

BTW would this not be mostly resolved as part of rebasing to UPM?
- host will not have these pages mapped in the first place (both kernel
directmap and qemu userspace)
- guest will have them mapped, but I assume that the conversion from private
to shared (that might fail?) can only happen after guest's mappings are
invalidated in the first place?

Yes, that will be true for guest private pages. But then there are host
allocated pages for firmware use which will remain in firmware page state or
reclaim state if they can't be transitioned back to HV/shared state once the
firmware releases them back to the host and accessing them at this point can
potentially cause RMP violation #PF.

Again i don't think this is going to happen regularly or frequently so it
will be a rare error case where the page reclamation, i.e., the transition
back to HV/shared state fails and now these pages are no longer safe to be
used.

Referring back to your thoughts about putting these pages on some leaked
pages list, do any such leaked pages list exist currently ?

Not AFAIK, you could just create a list_head somewhere appropriate (some snp
state structure?) and put the pages there, maybe with a counter exposed in
debugs. The point would be mostly that if something goes so wrong it would
be leaking substantial amounts of memory, we can at least recognize the
cause (but I suppose the dmesg will be also full of messages) and e.g. find
the pages in a crash dump.

Ok, so i will work on implementing this leaked pages list and put it on 
a sev/snp associated structure.

Also to add here, we will actually get a not-present #PF instead of the 
RMP violation #PF on writing to these leaked pages, as these pages would 
have been removed from the kernel direct map.

Thanks,
Ashish

Still waiting for some/more feedback from mm folks on the same.

Just send the patch and they'll give it.

Thx.