Re: [RFC PATCH 2/3] x86/mm: make sure LAM is up-to-date during context switching

Dave Hansen <dave.hansen@xxxxxxxxx> · Thu, 7 Mar 2024 13:39:53 -0800

On 3/7/24 13:04, Yosry Ahmed wrote:
> I thought about doing inc_mm_tlb_gen() when LAM is updated, but it felt
> hacky and more importantly doesn't make it clear in switch_mm_irqs_off()
> that we correctly handle LAM updates. We can certainly add a comment,
> but I think an explicit check for CPU LAM vs. mm LAM is much clearer.
> 
> WDYT?

The mm generations are literally there so that if the mm changes that
all the CPUs know they need an update.  Changing LAM enabling is 100%
consistent with telling other CPUs that they need an update.

I'd be curious of Andy feels differently though.

>> Considering how fun this code path is, a little effort at an actual
>> reproduction would be really appreciated.
> 
> I tried reproducing it but gave up quickly. We need a certain sequence
> of events to happen:
> 
> CPU 1					CPU 2
> kthread_use_mm()
> 					/* user thread enables LAM */
> 					context_switch()
> context_switch() /* to user thread */

First, it would be fine to either create a new kthread for reproduction
purposes or to hack an existing one.  For instance, have have the LAM
prctl() take an extra ref on the mm and stick it in a global variable:

	mmgrab(current->mm);
	global_mm = current->mm;

Then in the kthread, grab the mm and use it:

	while (!global_mm);
	kthread_use_mm(global_mm);
	... check for the race
	mmdrop(global_mm);

You can also hackily wait for thread to move with a stupid spin loop:

	while (smp_processor_id() != 1);

and then actually move it with sched_setaffinity() from userspace.  That
can make it easier to get that series of events to happen in lockstep.