Re: [RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 7 May 2024 09:16:06 -0700

On Tue, May 07, 2024, Mickaël Salaün wrote:
> > Actually, potential bad/crazy idea.  Why does the _host_ need to define policy?
> > Linux already knows what assets it wants to (un)protect and when.  What's missing
> > is a way for the guest kernel to effectively deprivilege and re-authenticate
> > itself as needed.  We've been tossing around the idea of paired VMs+vCPUs to
> > support VTLs and SEV's VMPLs, what if we usurped/piggybacked those ideas, with a
> > bit of pKVM mixed in?
> > 
> > Borrowing VTL terminology, where VTL0 is the least privileged, userspace launches
> > the VM at VTL0.  At some point, the guest triggers the deprivileging sequence and
> > userspace creates VTL1.  Userpace also provides a way for VTL0 restrict access to
> > its memory, e.g. to effectively make the page tables for the kernel's direct map
> > writable only from VTL1, to make kernel text RO (or XO), etc.  And VTL0 could then
> > also completely remove its access to code that changes CR0/CR4.
> > 
> > It would obviously require a _lot_ more upfront work, e.g. to isolate the kernel
> > text that modifies CR0/CR4 so that it can be removed from VTL0, but that should
> > be doable with annotations, e.g. tag relevant functions with __magic or whatever,
> > throw them in a dedicated section, and then free/protect the section(s) at the
> > appropriate time.
> > 
> > KVM would likely need to provide the ability to switch VTLs (or whatever they get
> > called), and host userspace would need to provide a decent amount of the backend
> > mechanisms and "core" policies, e.g. to manage VTL0 memory, teardown (turn off?)
> > VTL1 on kexec(), etc.  But everything else could live in the guest kernel itself.
> > E.g. to have CR pinning play nice with kexec(), toss the relevant kexec() code into
> > VTL1.  That way VTL1 can verify the kexec() target and tear itself down before
> > jumping into the new kernel. 
> > 
> > This is very off the cuff and have-wavy, e.g. I don't have much of an idea what
> > it would take to harden kernel text patching, but keeping the policy in the guest
> > seems like it'd make everything more tractable than trying to define an ABI
> > between Linux and a VMM that is rich and flexible enough to support all the
> > fancy things Linux does (and will do in the future).
> 
> Yes, we agree that the guest needs to manage its own policy.  That's why
> we implemented Heki for KVM this way, but without VTLs because KVM
> doesn't support them.
> 
> To sum up, is the VTL approach the only one that would be acceptable for
> KVM?  

Heh, that's not a question you want to be asking.  You're effectively asking me
to make an authorative, "final" decision on a topic which I am only passingly
familiar with.

But since you asked it... :-)  Probably?

I see a lot of advantages to a VTL/VSM-like approach:

 1. Provides Linux-as-a guest the flexibility it needs to meaningfully advance
    its security, with the least amount of policy built into the guest/host ABI.

 2. Largely decouples guest policy from the host, i.e. should allow the guest to
    evolve/update it's policy without needing to coordinate changes with the host.

 3. The KVM implementation can be generic enough to be reusable for other features.

 4. Other groups are already working on VTL-like support in KVM, e.g. for VSM
    itself, and potentially for VMPL/SVSM support.

IMO, #2 is a *huge* selling point.  Not having to coordinate changes across
multiple code bases and/or organizations and/or maintainers is a big win for
velocity, long term maintenance, and probably the very viability of HEKI.

Providing the guest with the tools to define and implement its own policy means
end users don't have to way for some third party, e.g. CSPs, to deploy the
accompanying host-side changes, because there are no host-side changes.

And encapsulating everything in the guest drastically reduces the friction with
changes in the kernel that interact with hardening, both from a technical and a
social perspective.  I.e. giving the kernel (near) complete control over its
destiny minimizes the number of moving parts, and will be far, far easier to sell
to maintainers.  I would expect maintainers to react much more favorably to being
handed tools to harden the kernel, as opposed to being presented a set of APIs
that can be used to make the kernel compliant with _someone else's_ vision of
what kernel hardening should look like.

E.g. imagine a new feature comes along that requires overriding CR0/CR4 pinning
in a way that doesn't fit into existing policy.  If the VMM is involved in
defining/enforcing the CR pinning policy, then supporting said new feature would
require new guest/host ABI and an updated host VMM in order to make the new
feature compatible with HEKI.  Inevitably, even if everything goes smoothly from
an upstreaming perspective, that will result in guests that have to choose between
HEKI and new feature X, because there is zero chance that all hosts that run Linux
as a guest will be updated in advance of new feature X being deployed.

And if/when things don't go smoothly, odds are very good that kernel maintainers
will eventually tire of having to coordinate and negotiate with QEMU and other
VMMs, and will become resistant to continuing to support/extend HEKI.

> If yes, that would indeed require a *lot* of work for something we're not
> sure will be accepted later on.

Yes and no.  The AWS folks are pursuing VSM support in KVM+QEMU, and SVSM support
is trending toward the paired VM+vCPU model.  IMO, it's entirely feasible to
design KVM support such that much of the development load can be shared between
the projects.  And having 2+ use cases for a feature (set) makes it _much_ more
likely that the feature(s) will be accepted.

And similar to what Paolo said regarding HEKI not having a complete story, I
don't see a clear line of sight for landing host-defined policy enforcement, as
there are many open, non-trivial questions that need answers. I.e. upstreaming
HEKI in its current form is also far from a done deal, and isn't guaranteed to
be substantially less work when all is said and done.