'context imbalance' warnings

Luc Van Oostenryck <luc.vanoostenryck@xxxxxxxxx> · Mon, 9 Nov 2020 22:16:16 +0100

Hi,

This is the continuation of last week thread.

I've added a few more simplifications for selects & compares on top
of those from last week. Alas, none of these makes any real difference:
in my usual tests I have just one warning less (total number is now 995):
   * fs/dcache.c:1226:24: warning: context imbalance in 'dentry_lru_isolate_shrink' - different lock contexts for basic block

But well, these simplifications are good and can help.

So, I've taken again a look at the origin of these warnings (I already
did this 2-3 years ago). Here is what I found for the 6 first ones:

1) intel_{start,stop}_scheduling()
   - take/release a lock conditionally
   - but intel_start_scheduling() returns void ...

2) get_locked_pte()
   - it's a conditional lock
   - some code assume it never fails
   - some code protect it by a [VM_]BUG_ON()
   - sparse complains

3) lock_vector_lock()
   - OK, but the annotation is missing, I'll send a fix

4) raw_spin_trylock()
   - on UP, raw_spin_trylock() takes the context twice ...
   - I'll see what can be fixed there

5) acpi_os_read_memory()
   the usage pattern is 'interesting':

	flag = 0;
	lock();
	addr = lookup();
	if (!addr) {
		unlock()
		addr = map();
		if (!addr)
			return ERROR;
		flag = true;
	}

	// sparse complains here

	... use addr ...

	if (flag)
		unmap(addr);
	else
		unlock();
	return 0;

   I see the pattern and I suppose the locking is correct but I
   think that sparse's warning is justified. Is it needed to do
   the early unlock() when the lookup fails?

6) clk_enable_lock() (@ drivers/clk/clk.c)
   This one is really interesting and shows one of Sparse's shortcoming

	if (!spin_trylock_irqsave(&lock, flags)) {
		if (some_test) {
			__acquire;
			local_save_flags(flags);
			return flags;
		}
		spin_lock_irqsave(&enable_lock, flags);
	}
	...
	return;

   So the code is correct but what sparse really sees is:

	flags = __raw_save_flags();
	cli;
	tmp = _raw_spin_try_lock();
	if (tmp)
		__acquire;
	else
		__raw_restore_flags(flags);

	// sparse complains here

	if (!tmp) {
		if (some_test) {
			__acquire;
			flags = __raw_save_flags();
			return flags;
		}
		flags = raw_spin_lock_irqsave();
	}
	...
	return flags;

   Now sparse could merge the two branches depending on tmp,
   as if the code would have been written as:

	flags = __raw_save_flags();
	cli;
	tmp = _raw_spin_try_lock();
	if (tmp)
		__acquire;
	else {
		__raw_restore_flags(flags);
		if (some_test) {
			__acquire;
			flags = __raw_save_flags();
			return flags;
		}
		flags = raw_spin_lock_irqsave();
	}
	...
	return flags;

   But the corresponding IR is:

	... %r9(flags) <- ...			// flags = __raw_save_flags();
	...					// cli();
	call.32	%r10(tmp) <- _raw_spin_trylock	// tmp = _raw_spin_try_lock();
	cbr	%r10(tmp), .L1, .L2		// if (tmp)
.L1:	context	1				//   __acquire;
	br	.L3
.L2:	<restore interrupts>			//   __raw_restore_flags(flags);
	br	.L3
.L3:	phisrc.32 %phi1(flags) <- %r9(flags)
	cbr	%r10(tmp), .L4, .L5		// if (!tmp)

   The presence of the phisrc at .L3 blocks the BB merging.
   But this is only part of the problem. First:
   1) The phisrc (and maybe the associated phi-node) could
      be moved/adjusted when merging the BBs
   2) phisrcs are not really needed anyway (they're just a
      mean to track the BB of each phi-node's arguments).

   But even without these phi-sources, some merging that could
   happen doesn't.  For example, code like:
	#define __acquire	__context__(X, 1)
	int  def(void);
	void use(int);
	int foo(int a)
	{
		int tmp = def();
		if (a)
			__acquire;
		else
			use(tmp);
		if (!a) {
			def();
			__acquire;
		}
		use(0);
		__release;
		return tmp;
	}

   produces:
	foo:
	.L0:	call.32     %r1 <- def
		cbr         %arg1, .L1, .L2
	.L1:	context     1
		br          .L3
	.L2:	call        use, %r1
		br          .L3
	.L3:	cbr         %arg1, .L5, .L4	;; false imbalance here
	.L4:	call.32     %r7 <- def
		context     1
		br          .L5
	.L5:	call        use, $0
		context     -1
		ret.32      %r1

   There is no phi-source here and the CBR at .L3 could be merged
   with the one at .L0, removing the false imbalance but it's not.
   I thought that sparse was doing this sort of branch simplification
   but it doesn't, or at least it doesn't in this (simple) situation.

   I'll begin to look at this. I'm not sure about a general solution
   but for a simple case like here (just a 'diamond') it seems simple
   enough to handle.

-- Luc