Hi Jim,
On 20.08.20 19:30, Jim Mattson wrote:
On Wed, Aug 19, 2020 at 2:00 AM Alexander Graf <graf@xxxxxxxxxx> wrote:
Why would we still need this with the allow list and user space #GP
deflection logic in place?
Conversion to an allow list is cumbersome when you have a short deny
list. Suppose that I want to implement the following deny list:
{IA32_ARCH_CAPABILITIES, HV_X64_MSR_REFERENCE_TSC,
MSR_GOOGLE_TRUE_TIME, MSR_GOOGLE_FDR_TRACE, MSR_GOOGLE_HBI}. What
would the corresponding deny list look like? Given your current
implementation, I don't think the corresponding allow list can
actually be constructed. I want to allow 2^32-5 MSRs, but I can allow
at most 122880, if I've done the math correctly. (10 ranges, each
spanning at most 0x600 bytes worth of bitmap.)
There are only very few MSR ranges that actually data. So in your case,
to allow all MSRs that Linux knows about in msr-index.h, you would need
[0x00000000 - 0x00002000]
[0x40000000 - 0x40000200]
[0x4b564d00 - 0x4b564e00]
[0x80868000 - 0x80868020]
[0xc0000000 - 0xc0000200]
[0xc0010000 - 0xc0012000]
[0xc0020000 - 0xc0020010]
which are 7 regions. For good measure, you can probably pad every one of
them to the full 0x3000 MSRs they can span.
For MSRs that KVM actually handles in-kernel (others don't need to be
allowed), the list shrinks to 5:
[0x00000000 - 0x00001000]
[0x40000000 - 0x40000200]
[0x4b564d00 - 0x4b564e00]
[0xc0000000 - 0xc0000200]
[0xc0010000 - 0xc0012000]
Let's extend them a bit to make reasoning easier:
[0x00000000 - 0x00003000]
[0x40000000 - 0x40003000]
[0x4b564d00 - 0x4b567000]
[0xc0000000 - 0xc0003000]
[0xc0010000 - 0xc0013000]
What are the odds that you will want to implicitly (without a new CAP,
that would need user space adjustments anyway) have a random new MSR
handled in-kernel with an identifier that is outside of those ranges?
I'm fairly confident that trends towards 0.
The only real downside I can see is that we just wasted ~8kb of RAM.
Nothing I would really get hung up on though.
Perhaps we should adopt allow/deny rules similar to those accepted by
most firewalls. Instead of ports, we have MSR indices. Instead of
protocols, we have READ, WRITE, or READ/WRITE. Suppose that we
supported up to <n> rules of the form: {start index, end index, access
modes, allow or deny}? Rules would be processed in the order given,
and the first rule that matched a given access would take precedence.
Finally, userspace could specify the default behavior (either allow or
deny) for any MSR access that didn't match any of the rules.
Thoughts?
That wouldn't scale well if you want to allow all architecturally useful
MSRs in a purely allow list fashion. You'd have to create hundreds of
rules - or at least a few dozen if you combine contiguous ranges.
If you really desperately believe a deny list is a better fit for your
use case, we could redesign the interface differently:
struct msr_set_accesslist {
#define MSR_ACCESSLIST_DEFAULT_ALLOW 0
#define MSR_ACCESSLIST_DEFAULT_DENY 1
u32 flags;
struct {
u32 flags;
u32 nmsrs; /* MSRs in bitmap */
u32 base; /* first MSR address to bitmap */
void *bitmap; /* pointer to bitmap, 1 means allow, 0 deny */
} lists[10];
};
which means in your use case, you can do
u64 deny = 0;
struct msr_set_accesslist access = {
.flags = MSR_ACCESSLIST_DEFAULT_ALLOW,
.lists = {
{
.nmsrs = 1,
.base = IA32_ARCH_CAPABILITIES,
.bitmap = &deny,
}, {
{
.nmsrs = 1,
.base = HV_X64_MSR_REFERENCE_TSC,
.bitmap = &deny,
}, {
{
.nmsrs = 1,
/* can probably be combined with the ones below? */
.base = MSR_GOOGLE_TRUE_TIME,
.bitmap = &deny,
}, {
{
.nmsrs = 1,
.base = MSR_GOOGLE_FDR_TRACE,
.bitmap = &deny,
}, {
{
.nmsrs = 1,
.base = MSR_GOOGLE_HBI,
.bitmap = &deny,
},
}
};
msr_set_accesslist(kvm_fd, &access);
while I can do the same dance as before, but with a single call rather
than multiple ones.
What do you think?
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879