On 7/19/2021 6:51 PM, Erdem Aktas wrote:
> There's one exception to this, which is the previous memory view in
> crash kernels. But that's an relatively obscure case and there might be
> other solutions for this.
I think this is an important angle. It might cause reliability issues.
if kexec kernel does not know which page is shared or private, it can
use a previously shared page as a code page which will not work. It is
also a security concern. Hosts can always cause crashes which forces
guests to do kexec for crash dump. If the kexec kernel does not know
which pages are validated before, it might be compromised with page
replay attacks.
First I suspect for crash it's not a real security problem if a
malicious hypervisor would inject zeroed pages. That means actual strong
checks against revalidation/reaccept are not needed. That still leaves
the issue of triggering an exception when the memory is not there. TDX
has an option to trigger a #VE in this case, but we will actually force
disable it to avoid #VE in the system call gap. But the standard crash
dump tools already know how to parse mem_map to distinguish different
types of pages. So they already would be able to do that. We just need
some kind of safety mechanism to prevent any crashes, but that should be
doable. Actually I'm not sure they're really needed because that's a
root operation.
Also kexec is not only for crash dumps. For warm resets, kexec kernel
needs to know the valid page map.
For non crash kexec it's fine to reaccept/validate memory because we
don't care about the old contents anymore, except for the kernel itself
and perhaps your stack/page tables. So something very simple is enough
for that too.
>> Also in general i don't think it will really happen, at least
initially.
>> All the shared buffers we use are allocated and never freed. So such a
>> problem could be deferred.
Does it not depend on kernel configs? Currently, there is a valid
control path in dma_alloc_coherent which might alloc and free shared
pages.
If the device filter is active it won't.
>> At the risk of asking a potentially silly question, would it be
>> reasonable to treat non-validated memory as not-present for kernel
>> purposes and hot-add it in a thread as it gets validated?
My concern with this is, it assumes that all the present memory is
private. UEFI might have some pages which are shared therefore also
are present.
Hot add is nearly always a bad idea.
-Andi