Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock

Fenghua Yu <fenghua.yu@xxxxxxxxx> · Wed, 26 Jun 2019 13:36:37 -0700

On Wed, Jun 26, 2019 at 10:20:05PM +0200, Thomas Gleixner wrote:
> On Tue, 18 Jun 2019, Fenghua Yu wrote:
> > +
> > +static atomic_t split_lock_debug;
> > +
> > +void split_lock_disable(void)
> > +{
> > +	/* Disable split lock detection on this CPU */
> > +	this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> > +	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
> > +
> > +	/*
> > +	 * Use the atomic variable split_lock_debug to ensure only the
> > +	 * first CPU hitting split lock issue prints one single complete
> > +	 * warning. This also solves the race if the split-lock #AC fault
> > +	 * is re-triggered by NMI of perf context interrupting one
> > +	 * split-lock warning execution while the original WARN_ONCE() is
> > +	 * executing.
> > +	 */
> > +	if (atomic_cmpxchg(&split_lock_debug, 0, 1) == 0) {
> > +		WARN_ONCE(1, "split lock operation detected\n");
> > +		atomic_set(&split_lock_debug, 0);
> 
> What's the purpose of this atomic_set()?

atomic_set() releases the split_lock_debug flag after WARN_ONCE() is done.
The same split_lock_debug flag will be used in sysfs write for atomic
operation as well, as proposed by Ingo in https://lkml.org/lkml/2019/4/25/48
So that's why the flag needs to be cleared, right?

> 
> > +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> > +{
> > +	unsigned int trapnr = X86_TRAP_AC;
> > +	char str[] = "alignment check";
> > +	int signr = SIGBUS;
> > +
> > +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> > +
> > +	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> > +		return;
> > +
> > +	cond_local_irq_enable(regs);
> > +	if (!user_mode(regs) && static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> > +		/*
> > +		 * Only split locks can generate #AC from kernel mode.
> > +		 *
> > +		 * The split-lock detection feature is a one-shot
> > +		 * debugging facility, so we disable it immediately and
> > +		 * print a warning.
> > +		 *
> > +		 * This also solves the instruction restart problem: we
> > +		 * return the faulting instruction right after this it
> 
> we return the faulting instruction ... to the store so we get our deposit
> back :)
> 
>   the fault handler returns to the faulting instruction which will be then
>   executed without ....
> 
> Don't try to impersonate code, cpus or whatever. It doesn't make sense and
> confuses people.
> 
> > +		 * will be executed without generating another #AC fault
> > +		 * and getting into an infinite loop, instead it will
> > +		 * continue without side effects to the interrupted
> > +		 * execution context.
> 
> That last part 'instead .....' is redundant. It's entirely clear from the
> above that the faulting instruction is reexecuted ....
> 
> Please write concise comments and do try to repeat the same information
> with a different painting.

I copied the comment completely from Ingo's comment on v8:
https://lkml.org/lkml/2019/4/25/40

Thanks.

-Fenghua