Hi Andy, thanks for you valuable feedback. On Thu, Jul 12, 2018 at 02:09:45PM -0700, Andy Lutomirski wrote: > > On Jul 11, 2018, at 4:29 AM, Joerg Roedel <joro@xxxxxxxxxx> wrote: > > -.macro SAVE_ALL pt_regs_ax=%eax > > +.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0 > > cld > > + /* Push segment registers and %eax */ > > PUSH_GS > > pushl %fs > > pushl %es > > pushl %ds > > pushl \pt_regs_ax > > + > > + /* Load kernel segments */ > > + movl $(__USER_DS), %eax > > If \pt_regs_ax != %eax, then this will behave oddly. Maybe it’s okay. > But I don’t see why this change was needed at all. This is a left-over from a previous approach I tried and then abandoned later. You are right, it is not needed. > > +/* > > + * Called with pt_regs fully populated and kernel segments loaded, > > + * so we can access PER_CPU and use the integer registers. > > + * > > + * We need to be very careful here with the %esp switch, because an NMI > > + * can happen everywhere. If the NMI handler finds itself on the > > + * entry-stack, it will overwrite the task-stack and everything we > > + * copied there. So allocate the stack-frame on the task-stack and > > + * switch to it before we do any copying. > > Ick, right. Same with machine check, though. You could alternatively > fix it by running NMIs on an irq stack if the irq count is zero. How > confident are you that you got #MC right? Pretty confident, #MC uses the exception entry path which also handles entry-stack and user-cr3 correctly. It might go through through the slow paranoid exit path, but that's okay for #MC I guess. And when the #MC happens while we switch to the task stack and do the copying the same precautions as for NMI apply. > > + */ > > +.macro SWITCH_TO_KERNEL_STACK > > + > > + ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV > > + > > + /* Are we on the entry stack? Bail out if not! */ > > + movl PER_CPU_VAR(cpu_entry_area), %edi > > + addl $CPU_ENTRY_AREA_entry_stack, %edi > > + cmpl %esp, %edi > > + jae .Lend_\@ > > That’s an alarming assumption about the address space layout. How > about an xor and an and instead of cmpl? As it stands, if the address > layout ever changes, the failure may be rather subtle. Right, I implement a more restrictive check. > Anyway, wouldn’t it be easier to solve this by just not switching > stacks on entries from kernel mode and making the entry stack bigger? > Stick an assertion in the scheduling code that we’re not on an entry > stack, perhaps. That'll save us the check whether we are on the entry stack and replace it with a check whether we are coming from user/vm86 mode. I don't think that this will simplify things much and I am a bit afraid that it'll break unwritten assumptions elsewhere. It is probably something we can look into later separatly from the basic pti-x32 enablement. Thanks, Joerg