Re: [RESEND RFC 0/2] Paravirtualized Control Register pinning

Liran Alon <liran.alon@xxxxxxxxxx> · Tue, 24 Dec 2019 22:35:43 +0200

> On 24 Dec 2019, at 21:44, Andersen, John S <john.s.andersen@xxxxxxxxx> wrote:
> 
> On Mon, 2019-12-23 at 16:48 +0200, Liran Alon wrote:
>>> On 20 Dec 2019, at 21:26, John Andersen <john.s.andersen@xxxxxxxxx>
>>> wrote:
>>> 
>>> Pinning is not active when running in SMM. Entering SMM disables
>>> pinned
>>> bits, writes to control registers within SMM would therefore
>>> trigger
>>> general protection faults if pinning was enforced.
>> 
>> For compatibility reasons, it’s reasonable that pinning won’t be
>> active when running in SMM.
>> However, I do think we should not allow vSMM code to change pinned
>> values when returning back from SMM.
>> This would prevent a vulnerable vSMI handler from modifying vSMM
>> state-area to modify CR4 when running outside of vSMM.
>> I believe in this case it’s legit to just forcibly restore original
>> CR0/CR4 pinned values. Ignoring vSMM changes.
>> 
> 
> In em_rsm could we just OR with the value of the PINNED MSRs right
> before the final return?

Not exactly.

If I understand correctly, the proposed mechanism should also allow pinning specific
system registers (e.g. CR0/CR4) bits to either being cleared or being set. Not necessarily being set.
As kvm_set_cr0() & kvm_set_cr4() were modified to only check if pinned bit changed value.

Therefore, you should modify enter_smm() to save current pinned bits values
and then silently restore their values on em_rsm().

> 
>>> The guest may never read pinned bits. If an attacker were to read
>>> the
>>> CR pinned MSRs, they might decide to preform another attack which
>>> would
>>> not cause a general protection fault.
>> 
>> I disagree with this statement.
>> An attacker knows what is the system it is attacking and can deduce
>> by that which bits it pinned…
>> Therefore, protecting from guest reading these is not important at
>> all.
>> 
> 
> Sure, I'll make it readable.
> 
>>> Should userspace expose the CR pining CPUID feature bit, it must
>>> zero CR
>>> pinned MSRs on reboot. If it does not, it runs the risk of having
>>> the
>>> guest enable pinning and subsequently cause general protection
>>> faults on
>>> next boot due to early boot code setting control registers to
>>> values
>>> which do not contain the pinned bits.
>> 
>> Why reset CR pinned MSRs by userspace instead of KVM INIT handling?
>> 
>>> When running with KVM guest support and paravirtualized CR pinning
>>> enabled, paravirtualized and existing pinning are setup at the same
>>> point on the boot CPU. Non-boot CPUs setup pinning upon
>>> identification.
>>> 
>>> Guests using the kexec system call currently do not support
>>> paravirtualized control register pinning. This is due to early boot
>>> code writing known good values to control registers, these values
>>> do
>>> not contain the protected bits. This is due to CPU feature
>>> identification being done at a later time, when the kernel properly
>>> checks if it can enable protections.
>>> 
>>> Most distributions enable kexec. However, kexec could be made boot
>>> time
>>> disableable. In this case if a user has disabled kexec at boot time
>>> the guest will request that paravirtualized control register
>>> pinning
>>> be enabled. This would expand the userbase to users of major
>>> distributions.
>>> 
>>> Paravirtualized CR pinning will likely be incompatible with kexec
>>> for
>>> the foreseeable future. Early boot code could possibly be changed
>>> to
>>> not clear protected bits. However, a kernel that requests CR bits
>>> be
>>> pinned can't know if the kernel it's kexecing has been updated to
>>> not
>>> clear protected bits. This would result in the kernel being kexec'd
>>> almost immediately receiving a general protection fault.
>> 
>> Instead of disabling kexec entirely, I think it makes more sense to
>> invent
>> some generic mechanism in which new kernel can describe to old kernel
>> a set of flags that specifies which features hand-over it supports.
>> One of them
>> being pinned CRs.
>> 
>> For example, isn’t this also relevant for IOMMU DMA protection?
>> i.e. Doesn’t old kernel need to know if it should disable or enable
>> IOMMU DMAR
>> before kexec to new kernel? Similar to EDK2 IOMMU DMA protection
>> hand-over?
> 
> Great idea.
> 
> Making kexec work will require changes to these files and maybe more:
> 
> arch/x86/boot/compressed/head_64.S
> arch/x86/kernel/head_64.S
> arch/x86/kernel/relocate_kernel_64.S
> 
> Which my previous attempts showed different results when running
> virtualized vs. unvirtualized. Specificity different behavior with SMAP
> and UMIP bits.

I didn’t understand why there should be different results when running virtualized or not.

What I suggested is to just add metadata in a vmlinux ELF section that will be designated to
describe vmlinux handover capabilities.

Then, kexec functionality can be modified to read this section before loading & jumping
to new kernel vmlinux.

In the context of this patch-set, this section will specify a flag of if new vmlinux supports
CR0/CR4 pinning hand-over. If not and current kernel already pinned these values,
kexec should just return failure.

Just for example, in the context of IOMMU DMA protection hand-over, kexec will
use another flag to new of new vmlinux supports IOMMU DMA protection hand-over,
and if not, make sure to disable IOMMU before jumping to new vmlinux.

> 
> This would be a longer process though. As validating that everything
> still works in both the VM and on physical hosts will be required. As
> it stands this patchset could pick up a fairly large userbase via the
> virtualized container projects. Should we pursue kexec in this patchset
> or a later one?

In my opinion, this should be handled as part of this patch-series.
Otherwise, you basically introduce a kernel change that breaks existing functionality
in unpredictable manner to user.

i.e. It’s ok of kernel would haven’t allowed to use system registers pinning functionality
unless user also configured at boot-time that it disables kexec. But it’s not ok for
kernel behaviour to change such that kexec suddenly crashes if kernel was upgraded
to now use system registers pinning.

My personal opinion though is that kexec should first be enhanced to read hand-over metadata
as I described. And only then apply this patch-series which also modifies kexec to define and use
one of the hand-over bits as I mentioned above. But I would like to hear additional opinions on this.

In addition, I would like to understand Linux security maintainers point of view on the comment
I mentioned in another reply regarding how this mechanism is implemented in Hyper-V.

-Liran

> 
> Thanks,
> John