On Fri, Mar 19, 2021 at 11:15:42PM +0900, Tetsuo Handa wrote: > On 2021/03/12 0:54, Marco Elver wrote: > >> But the more we could have the compiler automatically figure out > >> things without needing an explicit tag, it would seem to me that this > >> would be better, since manual tagging is going to be more error-prone. > > > > What you're alluding to here would go much further than a data race > > detector ("data race" is still just defined by the memory model). The > > wish that there was a static analysis tool that would automatically > > understand the "concurrency semantics as intended by the developer" is > > something that'd be nice to have, but just doesn't seem realistic. > > Because how can a tool tell what the developer intended, without input > > from that developer? > > Input from developers is very important for not only compilers and tools > but also allowing bug-explorers to understand what is happening. > ext4 currently has > > possible deadlock in start_this_handle (2) > https://syzkaller.appspot.com/bug?id=38c060d5757cbc13fdffd46e80557c645fbe79ba > > which even maintainers cannot understand what is happening. > How can bug-explorers know implicit logic which maintainers believe safe and correct? > It is possible that some oversight in implicit logic is the cause of > "possible deadlock in start_this_handle (2)". > Making implicit assumptions clear helps understanding. Just to be clear, the above diagnostic is from lockdep rather than KCSAN. According to the sample crash result, different code paths acquire jdb2_handle and the __fs_reclaim_map in different orders. It looks to me that __fs_reclaim_map isn't really a lock, but rather a mode indicator. If so, lockdep should set it up accordingly, perhaps in a manner similar to rcu_lock_map. > Will "KCSAN: data-race in start_this_handle / start_this_handle" be addressed by marking? > syzbot is already waiting for > "KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata" at > https://syzkaller.appspot.com/bug?id=5eb10023f53097f003e72c6a7c1a6f14b7c22929 . The first thing is to work out what the code should be doing. What KCSAN is saying is that a variable is being locklessly updated. Is it really OK for that variable to be locklessly updated? If not, a larger fix is required. For more information, please see Marco's LWN series: https://lwn.net/Articles/816850/ and https://lwn.net/Articles/816854/ Alternatively, you can refer to the documentation being proposed for the Linux kernel tree: https://lore.kernel.org/lkml/20210304004543.25364-3-paulmck@xxxxxxxxxx/ > > If there's worry marking accesses is error-prone, then that might be a > > signal that the concurrency design is too complex (or the developer > > hasn't considered all cases). > > > > For that reason, we need to mark accesses to tell the compiler and > > tooling where to expect concurrency, so that 1) the compiler generates > > correct code, and 2) tooling such as KCSAN can double-check what the > > developer intended is actually what's happening. > > and 3) bug-explorers can understand what the developers are assuming/missing. If the above information doesn't help the bug explorers, please let me know. Thanx, Paul