> On 20 Dec 2019, at 21:26, John Andersen <john.s.andersen@xxxxxxxxx> wrote: > > Paravirtualized Control Register pinning is a strengthened version of > existing protections on the Write Protect, Supervisor Mode Execution / > Access Protection, and User-Mode Instruction Prevention bits. The > existing protections prevent native_write_cr*() functions from writing > values which disable those bits. This patchset prevents any guest > writes to control registers from disabling pinned bits, not just writes > from native_write_cr*(). This stops attackers within the guest from > using ROP to disable protection bits. > > https://urldefense.proofpoint.com/v2/url?u=https-3A__web.archive.org_web_20171029060939_http-3A__www.blackbunny.io_linux-2Dkernel-2Dx86-2D64-2Dbypass-2Dsmep-2Dkaslr-2Dkptr-5Frestric_&d=DwIDAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=-H3SsRpu0sEBqqn9-OOVimBDXk6TimcJerlu4-ko5Io&s=TrjU4_UEZIoYjxtoXcjsA8Riu0QZ8eI7a4fH96hSBQc&e= > > The protection is implemented by adding MSRs to KVM which contain the > bits that are allowed to be pinned, and the bits which are pinned. The > guest or userspace can enable bit pinning by reading MSRs to check > which bits are allowed to be pinned, and then writing MSRs to set which > bits they want pinned. > > Other hypervisors such as HyperV have implemented similar protections > for Control Registers and MSRs; which security researchers have found > effective. > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.abatchy.com_2018_01_kernel-2Dexploitation-2D4&d=DwIDAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=-H3SsRpu0sEBqqn9-OOVimBDXk6TimcJerlu4-ko5Io&s=Fg3e-BSUebNg44Ocp_y19xIoK0HJEHPW2AgM958F3Uc&e= > I think it’s important to mention how Hyper-V implements this protection as it is done in a very different architecture. Hyper-V implements a set of PV APIs named VSM (Virtual Secure Mode) aimed to allow a guest (partition) to separate itself to multiple security domains called VTLs (Virtual Trust Level). The VSM API expose an interface to higher VTLs to control the execution of lower VTLs. In theory, VSM supports up to 16 VTLs, but Windows VBS (Virtualization Based Security) that is the only current technology which utilise VSM, use only 2 VTLs. VTL0 for most of OS execution (Normal-Mode) and VTL1 for a secure OS execution (Secure-Mode). Higher VTL controls execution of lower VTL by the following VSM mechanisms: 1) Memory Access Protections: Allows higher VTL to restrict memory access to physical pages. Either making them inaccessible or limited to certain permissions. 2) Secure Intercepts: Allows a higher VTL to request hypervisor to intercept certain events in lower VTLs for handling by higher VTL. This includes access to system registers (e.g. CRs & MSRs). VBS use above mentioned mechanisms as follows: a) Credentials Guard: Prevents pass-the-hash attacks. Done by encrypting credentials using a VTL1 trustlet to encrypt them by an encryption-key stored in VTL1-only accessible memory. b) HVCI (Hypervisor-based Code-Integrity): Prevents execution of unsigned code. Done by marking all EPT entries with NX until signature verified by VTL1 service. Once verified, mark EPT entries as RO+X. (HVCI also supports enforcing code-signing only on Ring0 code efficiently by utilising Intel MBEC or AMD GMET CPU features. Which allows setting NX-bit on EPT entries based on guest CPL). c) KDP (Kernel Data Protection): Marks certain pages after initialisation as read-only on VTL0 EPT. d) kCFG (Kernel Control-Flow Guard): VTL1 protects bitmap,specifying valid indirect branch targets, by protecting it with read-only on VTL0 EPT. e) HyperGuard: VTL1 use “Secure Intercepts” mechanism to prevent VTL0 from modifying important system registers. Including CR0 & CR4 as done by this patch. HyperGuard also implements a mechanism named NPIEP (Non-Privileged Instruction Execution Prevention) that prevents VTL0 Ring3 executing SIDT/SGDT/SLDT/STR to leak Ring0 addresses. To sum-up, In Hyper-V, the hypervisor expose a relatively thin API to allow guest to partition itself to multiple security domains (enforced by virtualization). Using this framework, it’s possible to implement multiple OS-level protection mechanisms. Only one of them are pinning certain registers to specific values as done by this patch. Therefore, as I also tried to say in recent KVM Forum, I think KVM should consider exposing a VSM-like API to guest to allow various guest OS, Including Linux, to implement VBS-like features. To decide on how this API should look like, we need to have a more broad discussion with Linux Security maintainers and KVM maintainers on which security features we would like to implement using such API and what should be their architecture. Then, we can implement this API in KVM and start to gradually introduce more security features in Linux which utilise this API. Once Linux will have security features implemented with this new KVM API, we could also consider implementing them on top of other similar hypervisor APIs such as Hyper-V VSM. To achieve, for example, Linux being more secure when running on Microsoft Azure compute instances. Therefore, I see this patch as a short-term solution to quickly gain real security value on a very specific issue. But if we are serious about improving Linux security using Virtualization, we should have this more broad discussion. -Liran