As mentioned in previous patches, larger address spaces mean larger MPX tables. But, the entire system is either entirely using 5-level paging, or not. We do not mix pagetable formats. If the size of the MPX tables depended soley on the paging mode, old binaries would break because the format of the tables changed underneath them. So, since CR4 never changes, but we need some way to change the MPX table format, a new MSR is introduced: MSR_IA32_MPX_LAX. If we are in 5-level paging mode *and* the enable bit in this MSR is set, the CPU will use the new, larger MPX bounds table format. If 5-level paging is disabled, or the enable bit is clear, then the legacy-style smaller tables will be used. But, we might mix legacy and non-legacy binaries on the same system, so this MSR needs to be context-switched. Add code to do this, along with some simple optimizations to skip the MSR writes if the MSR does not need to be updated. --- b/arch/x86/include/asm/msr-index.h | 1 b/arch/x86/mm/tlb.c | 42 +++++++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+) diff -puN arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr arch/x86/include/asm/msr-index.h --- a/arch/x86/include/asm/msr-index.h~mawa-050-context-switch-msr 2017-01-26 14:31:37.747902524 -0800 +++ b/arch/x86/include/asm/msr-index.h 2017-01-26 14:31:37.752902749 -0800 @@ -410,6 +410,7 @@ #define MSR_IA32_BNDCFGS 0x00000d90 #define MSR_IA32_XSS 0x00000da0 +#define MSR_IA32_MPX_LAX 0x00001000 #define FEATURE_CONTROL_LOCKED (1<<0) #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (1<<1) diff -puN arch/x86/mm/tlb.c~mawa-050-context-switch-msr arch/x86/mm/tlb.c --- a/arch/x86/mm/tlb.c~mawa-050-context-switch-msr 2017-01-26 14:31:37.749902614 -0800 +++ b/arch/x86/mm/tlb.c 2017-01-26 14:31:37.753902794 -0800 @@ -71,6 +71,47 @@ void switch_mm(struct mm_struct *prev, s local_irq_restore(flags); } +/* + * The MPX tables change sizes based on the size of the virtual + * (aka. linear) address space. There is an MSR to tell the CPU + * whether we want the legacy-style ones or the larger ones when + * we are running with an eXtended virtual address space. + */ +static void switch_mawa(struct mm_struct *prev, struct mm_struct *next) +{ + /* + * Note: there is one and only one bit in use in the MSR + * at this time, so we do not have to be concerned with + * preseving any of the other bits. Just write 0 or 1. + */ + unsigned IA32_MPX_LAX_ENABLE_MASK = 0x00000001; + + if (!cpu_feature_enabled(X86_FEATURE_MPX)) + return; + /* + * FIXME: do we want a check here for the 5-level paging + * CR4 bit or CPUID bit, or is the mawa check below OK? + * It's not obvious what would be the fastest or if it + * matters. + */ + + /* + * Avoid the relatively costly MSR if we are not changing + * MAWA state. All processes not using MPX will have a + * mpx_mawa_shift()=0, so we do not need to check + * separately for whether MPX management is enabled. + */ + if (mpx_mawa_shift(prev) == mpx_mawa_shift(next)) + return; + + if (mpx_mawa_shift(next)) { + wrmsr(MSR_IA32_MPX_LAX, IA32_MPX_LAX_ENABLE_MASK, 0x0); + } else { + /* clear the enable bit: */ + wrmsr(MSR_IA32_MPX_LAX, 0x0, 0x0); + } +} + void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { @@ -136,6 +177,7 @@ void switch_mm_irqs_off(struct mm_struct /* Load per-mm CR4 state */ load_mm_cr4(next); + switch_mawa(prev, next); #ifdef CONFIG_MODIFY_LDT_SYSCALL /* * Load the LDT, if the LDT is different. _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>