Re: [PATCH] Fix data race in mark_rt_mutex_waiters

Waiman Long <longman@xxxxxxxxxx> · Tue, 24 Jan 2023 10:42:24 -0500

On 1/24/23 09:57, Hernan Ponce de Leon wrote:
In the case, the value read is passed into cmpxchg_relaxed(), which
checks the value against memory.  In this case, as Arjan noted, the only
compiler-and-silicon difference between data_race() and READ_ONCE()
is that use of data_race() might allow the compiler to do things like
tear the load, thus forcing the occasional spurious cmpxchg_relaxed()
failure.  In contrast, LKMM (by design) throws up its hands when it sees
a data race.  Something about not being eager to track the 
idiosyncrasies
of many compiler versions.

My approach in my own code is to use *_ONCE() unless it causes a visible
performance regression or if it confuses KCSAN.  An example of the 
latter
can be debug code, in which case use of data_race() avoids suppressing
KCSAN warnings (and also false positives, depending).

I understand that *_ONCE() might avoid some compiler optimization and 
reduce performance in the general case. However, if I understand your 
first paragraph correctly, in this particular case data_race() could 
allow the CAS to fail more often, resulting in more spinning 
iterations and degraded performance. Am I right?

Except that your other email seems to also be arguing that additional
ordering is required.  So is https://lkml.org/lkml/2023/1/20/702 really
sufficient just by itself, or is additional ordering required?

I do not claim that we need to mark the read to add the ordering that 
is needed for correctness (mutual exclusion). What I claim in this 
patch is that there is a data race, and since it can affect ordering 
constrains in subtle ways, I consider it harmful and thus I want to 
fix it.

What I explain in the other email is that if we fix the data race, 
either the fence or the acquire store might be relaxed (because 
marking the read gives us some extra ordering guarantees). If the race 
is not fixed, both the fence and the acquire are needed according to 
LKMM. The situation is different wrt hardware models. In that case the 
tool cannot find any violation even if we don't fix the race and we 
relax the store / remove the fence.

I would suggest to do it as suggested by PeterZ. Instead of set_bit(), 
however, it is probably better to use atomic_long_or() like

atomic_long_or_relaxed(RT_MUTEX_HAS_WAITERS, (atomic_long_t *)&lock->owner)

The mutex code stores the lock owner as atomic_long_t. So it is natural 
to treat &lock->owner as atomic_long_t here too.

Cheers,
Longman