Re: [PATCHv7 10/14] x86/mm: Avoid load_unaligned_zeropad() stepping into unaccepted memory

Dave Hansen <dave.hansen@xxxxxxxxx> · Tue, 2 Aug 2022 16:46:38 -0700

On 7/26/22 03:21, Borislav Petkov wrote:
> On Tue, Jun 14, 2022 at 03:02:27PM +0300, Kirill A. Shutemov wrote:
>> But, this approach does not work for unaccepted memory. For TDX, a load
>> from unaccepted memory will not lead to a recoverable exception within
>> the guest. The guest will exit to the VMM where the only recourse is to
>> terminate the guest.
> FTR, this random-memory-access-to-unaccepted-memory-is-deadly thing is
> really silly. We should be able to handle such cases - because they do
> happen often - in a more resilient way. Just look at the complex dance
> this patch needs to do just to avoid this.
> 
> IOW, this part of the coco technology needs improvement.

This particular wound is self-inflicted.  The hardware can *today*
generate a #VE for these accesses.  But, to make writing the #VE code
more straightforward, we asked that the hardware not even bother
delivering the exception.  At the time, nobody could come up with a case
why there would ever be a legitimate, non-buggy access to unaccepted memory.

We learned about load_unaligned_zeropad() the hard way.  I never ran
into it and never knew it was there.  Dangit.

We _could_ go back to the way it was originally.  We could add
load_unaligned_zeropad() support to the #VE handler, and there's little
risk of load_unaligned_zeropad() itself being used in the
interrupts-disabled window early in the #VE handler.  That would get rid
of all the nasty adjacent page handling in the unaccepted memory code.

But, that would mean that we can land in the #VE handler from more
contexts.  Any normal, non-buggy use of load_unaligned_zeropad() can end
up there, obviously.  We would, for instance, need to be more careful
about #VE recursion.  We'd also have to make sure that _bugs_ that land
in the #VE handler can still be handled in a sane way.

To sum it all up, I'm not happy with the complexity of the page
acceptance code either but I'm not sure that it's bad tradeoff compared
to greater #VE complexity or fragility.

Does anyone think we should go back and really reconsider this?