On Fri, Jul 20, 2018 at 9:22 AM, Joerg Roedel <joro@xxxxxxxxxx> wrote: > From: Joerg Roedel <jroedel@xxxxxxx> > > The code that switches from entry- to task-stack when we > enter from kernel-mode copies the full entry-stack contents > to the task-stack. > > That is because we don't trust that the entry-stack > contents. But actually we can trust its contents if we are > not scheduled between entry and exit. > > So do less copying and move only the ptregs over to the > task-stack in this code-path. > > Suggested-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx> > Signed-off-by: Joerg Roedel <jroedel@xxxxxxx> > --- > arch/x86/entry/entry_32.S | 70 +++++++++++++++++++++++++---------------------- > 1 file changed, 38 insertions(+), 32 deletions(-) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index 2767c62..90166b2 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -469,33 +469,48 @@ > * segment registers on the way back to user-space or when the > * sysenter handler runs with eflags.tf set. > * > - * When we switch to the task-stack here, we can't trust the > - * contents of the entry-stack anymore, as the exception handler > - * might be scheduled out or moved to another CPU. Therefore we > - * copy the complete entry-stack to the task-stack and set a > - * marker in the iret-frame (bit 31 of the CS dword) to detect > - * what we've done on the iret path. > + * When we switch to the task-stack here, we extend the > + * stack-frame we copy to include the entry-stack %esp and a > + * pseudo %ss value so that we have a full ptregs struct on the > + * stack. We set a marker in the frame (bit 31 of the CS dword). > * > - * On the iret path we copy everything back and switch to the > - * entry-stack, so that the interrupted kernel code-path > - * continues on the same stack it was interrupted with. > + * On the iret path we read %esp from the PT_OLDESP slot on the > + * stack and copy ptregs (except oldesp and oldss) to it, when > + * we find the marker set. Then we switch to the %esp we read, > + * so that the interrupted kernel code-path continues on the > + * same stack it was interrupted with. Can you give an example of the exact scenario in which any of this copying happens and why it's needed? IMO you should just be able to *run* on the entry stack without copying anything at all.