Re: [tip: x86/entry] x86/entry: Avoid redundant CR3 write on paranoid returns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Apologies if you see this as a duplicate, accidentally sent the
original in HTML, please disregard the other one]

Hi Thomas,

I have just noticed that the commit has disappeared from
tip/x86/entry. Is that deliberate?

Thanks,
Brendan


On Wed, 24 Jan 2024 at 19:36, tip-bot2 for Lai Jiangshan
<tip-bot2@xxxxxxxxxxxxx> wrote:
>
> The following commit has been merged into the x86/entry branch of tip:
>
> Commit-ID:     bb998361999e79bc87dae1ebe0f5bf317f632585
> Gitweb:        https://git.kernel.org/tip/bb998361999e79bc87dae1ebe0f5bf317f632585
> Author:        Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> AuthorDate:    Mon, 08 Jan 2024 11:39:50
> Committer:     Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> CommitterDate: Wed, 24 Jan 2024 13:57:59 +01:00
>
> x86/entry: Avoid redundant CR3 write on paranoid returns
>
> The CR3 restore happens in:
>
>   1. #NMI return.
>   2. paranoid_exit() (i.e. #MCE, #VC, #DB and #DF return)
>
> Contrary to the implication in commit 21e94459110252 ("x86/mm: Optimize
> RESTORE_CR3"), the kernel never modifies CR3 in any of these exceptions,
> except for switching from user to kernel pagetables under PTI. That
> means that most of the time when returning from an exception that
> interrupted the kernel no CR3 restore is necessary. Writing CR3 is
> expensive on some machines.
>
> Most of the time because the interrupt might have come during kernel entry
> before the user to kernel CR3 switch or the during exit after the kernel to
> user switch. In the former case skipping the restore would be correct, but
> definitely not for the latter.
>
> So check the saved CR3 value and restore it only, if it is a user CR3.
>
> Give the macro a new name to clarify its usage, and remove a comment that
> was describing the original behaviour along with the not longer needed jump
> label.
>
> Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Brendan Jackman <jackmanb@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Link: https://lore.kernel.org/r/20240108113950.360438-1-jackmanb@xxxxxxxxxx
>
> [Rewrote commit message; responded to review comments]
> Change-Id: I6e56978c4753fb943a7897ff101f519514fa0827
> ---
>  arch/x86/entry/calling.h  | 26 ++++++++++----------------
>  arch/x86/entry/entry_64.S |  7 +++----
>  2 files changed, 13 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 9f1d947..92dca4a 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -239,17 +239,19 @@ For 32-bit we have the following conventions - kernel is built with
>  .Ldone_\@:
>  .endm
>
> -.macro RESTORE_CR3 scratch_reg:req save_reg:req
> +/* Restore CR3 from a kernel context. May restore a user CR3 value. */
> +.macro PARANOID_RESTORE_CR3 scratch_reg:req save_reg:req
>         ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>
> -       ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
> -
>         /*
> -        * KERNEL pages can always resume with NOFLUSH as we do
> -        * explicit flushes.
> +        * If CR3 contained the kernel page tables at the paranoid exception
> +        * entry, then there is nothing to restore as CR3 is not modified while
> +        * handling the exception.
>          */
>         bt      $PTI_USER_PGTABLE_BIT, \save_reg
> -       jnc     .Lnoflush_\@
> +       jnc     .Lend_\@
> +
> +       ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
>
>         /*
>          * Check if there's a pending flush for the user ASID we're
> @@ -257,20 +259,12 @@ For 32-bit we have the following conventions - kernel is built with
>          */
>         movq    \save_reg, \scratch_reg
>         andq    $(0x7FF), \scratch_reg
> -       bt      \scratch_reg, THIS_CPU_user_pcid_flush_mask
> -       jnc     .Lnoflush_\@
> -
>         btr     \scratch_reg, THIS_CPU_user_pcid_flush_mask
> -       jmp     .Lwrcr3_\@
> +       jc      .Lwrcr3_\@
>
> -.Lnoflush_\@:
>         SET_NOFLUSH_BIT \save_reg
>
>  .Lwrcr3_\@:
> -       /*
> -        * The CR3 write could be avoided when not changing its value,
> -        * but would require a CR3 read *and* a scratch register.
> -        */
>         movq    \save_reg, %cr3
>  .Lend_\@:
>  .endm
> @@ -285,7 +279,7 @@ For 32-bit we have the following conventions - kernel is built with
>  .endm
>  .macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req
>  .endm
> -.macro RESTORE_CR3 scratch_reg:req save_reg:req
> +.macro PARANOID_RESTORE_CR3 scratch_reg:req save_reg:req
>  .endm
>
>  #endif
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index c40f89a..aedd169 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -968,14 +968,14 @@ SYM_CODE_START_LOCAL(paranoid_exit)
>         IBRS_EXIT save_reg=%r15
>
>         /*
> -        * The order of operations is important. RESTORE_CR3 requires
> +        * The order of operations is important. PARANOID_RESTORE_CR3 requires
>          * kernel GSBASE.
>          *
>          * NB to anyone to try to optimize this code: this code does
>          * not execute at all for exceptions from user mode. Those
>          * exceptions go through error_return instead.
>          */
> -       RESTORE_CR3     scratch_reg=%rax save_reg=%r14
> +       PARANOID_RESTORE_CR3 scratch_reg=%rax save_reg=%r14
>
>         /* Handle the three GSBASE cases */
>         ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE
> @@ -1404,8 +1404,7 @@ end_repeat_nmi:
>         /* Always restore stashed SPEC_CTRL value (see paranoid_entry) */
>         IBRS_EXIT save_reg=%r15
>
> -       /* Always restore stashed CR3 value (see paranoid_entry) */
> -       RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
> +       PARANOID_RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
>
>         /*
>          * The above invocation of paranoid_entry stored the GSBASE




[Index of Archives]     [Linux Stable Commits]     [Linux Stable Kernel]     [Linux Kernel]     [Linux USB Devel]     [Linux Video &Media]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux