Re: [RFCv2 07/10] x86/mm: Handle tagged memory accesses from kernel threads

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Thu, 12 May 2022 15:30:01 +0200

On Wed, May 11 2022 at 05:27, Kirill A. Shutemov wrote:
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index f9fe71d1f42c..b320556e1c22 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -185,6 +185,34 @@ static u8 gen_lam(struct task_struct *tsk, struct mm_struct *mm)
>  	if (!tsk)
>  		return LAM_NONE;
>  
> +	if (tsk->flags & PF_KTHREAD) {
> +		/*
> +		 * For kernel thread use the most permissive LAM
> +		 * used by the mm. It's required to handle kernel thread
> +		 * memory accesses on behalf of a process.
> +		 *
> +		 * Adjust thread flags accodringly, so untagged_addr() would
> +		 * work correctly.
> +		 */
> +
> +		tsk->thread.features &= ~(X86_THREAD_LAM_U48 |
> +					  X86_THREAD_LAM_U57);
> +
> +		switch (mm->context.lam) {
> +		case LAM_NONE:
> +			return LAM_NONE;
> +		case LAM_U57:
> +			tsk->thread.features |= X86_THREAD_LAM_U57;
> +			return LAM_U57;
> +		case LAM_U48:
> +			tsk->thread.features |= X86_THREAD_LAM_U48;
> +			return LAM_U48;

Pretending that LAM is configurable per thread and then having a magic
override in the per process mm when accessing that process' memory from
a kernel thread is inconsistent, a horrible hack and a recipe for
hard to diagnose problems.

LAM has to be enabled by the process _before_ creating threads and then
stay enabled until the whole thing dies. That's the only sensible use
case.

I understand that tsk->thread.features is conveniant for the untagging
mechanism, but the whole setup should be:

prctl(ENABLE, which)
     if (can_enable_lam(which)) {
     	mm->lam.c3_mask = CR3_LAM(which);
        mm->lam.untag_mask = UNTAG_LAM(which);
        current->thread.lam_untag_mask = mm->lam.untag_mask;
     }

and

can_enable_lam(which)
    if (current_is_multithreaded())
    	return -ETOOLATE;
    if (current->mm->lam_cr3_mask)
    	return -EBUSY;
    ....

Now vs. kernel threads. Doing this like the above is just the wrong
place. If a kernel thread accesses user space memory of a process then
it has to invoke kthread_use_mm(), right? So the obvious point to cache
that setting is in kthread_use_mm() and kthread_unuse_mm() clears it:

kthread_use_mm()
     current->thread.lam_untag_mask = mm->lam.untag_mask;

kthread_unuse_mm()
     current->thread.lam_untag_mask = 0;

This makes all of the mechanics trivial because CR3 switch then simply
does:

     new_cr3 |= mm->lam.c3_mask;

No conditionals and evaluations, nothing. Just straight forward and
comprehensible code.

Thanks,

        tglx