On Wed, Apr 08, 2020 at 03:37:57PM -0700, Paul E. McKenney wrote: > On Wed, Apr 08, 2020 at 02:14:38PM -0700, Linus Torvalds wrote: > > On Wed, Apr 8, 2020 at 1:26 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > OK, it sounds like we need to specify what needs to be present in this > > > sort of commit log. > > > > That would have helped. > > > > But in this case, after having looked more at it, it wasn't that the > > commit log was not sufficient, it was that the change was wrong. > > > > With a better commit log and a pointer to the KCSAN report, I would > > have been able to make a better judgment call earlier, and not have > > had to ask for more information. > > > > But once I figured out what the problem was, it was clear that > > > > (a) what the i915 driver does is at a minimum questionable, and quite > > likely actively buggy > > > > (b) even if it's not buggy in the i915 driver, changing the list > > traversal macros to use READ_ONCE() would hide other places where it > > most definitely is buggy. > > Yeah, and I should especially have spotted this last one, given how many > times I have run into it while using KCSAN within RCU. ;-/ > > In any case, we have review comments, discussion, and the occasional > controvery for non-KCSAN bug fixes, so it is only reasonable to expect > a similar process for KCSAN bug reports, right? My main goal here is > to make things easier for people reviewing future KCSAN-inspired fixes. > > > > o The KCSAN splat, optionally including file/line numbers. > > > > I do think that the KCSAN splat - very preferably simplified and with > > explanations - should basically always be included if KCSAN was the > > reason. > > > > Of course, the "simplified and with explanations" may be simplified to > > the point where none of the original splat even remains. Those splats > > aren't so legible that anybody should feel like they should be > > included. > > > > Put another way: an analysis that is so thorough that it makes the raw > > splat data pointless is a *good* thing, and if that means that no > > splat is needed, all the better. > > Fair point. The KCSAN splat has the added advantage of immediately > answering the "but does this really happen?" question, but then again > that question could also be answered by just stating that the issue was > found using KCSAN. > > > > o Detailed description of the problem this splat identifies, for > > > example, the code might fail to acquire a necessary lock, a plain > > > load might vulnerable to compiler optimizations, and so on. > > > > See above: if the description is detailed enough, then the splat > > itself becomes much less interesting. > > > > But in this case, for example, it's worth pointing out that the "safe" > > list accessors are used all over the kernel, and they are very much > > _not_ thread-safe. The "safe" in them is purely about the current > > thread removing an entry, not some kind of general thread-safeness. > > > > Which is why I decided I really hated that patch - it basically would > > make KCSAN unable to find _real_ races elsewhere, because it hid a > > very special race in the i915 driver. > > > > So I started out with the "that can't be right" kind of general > > feeling of uneasiness about the patch. I ended up with a much more > > specific "no, that's very much not right" and no amount of commit log > > should have saved it. > > > > As mentioned, I suspect that the i915 driver could actually use the > > RCU-safe list walkers. Their particular use is not about RCU per se, > > but it has some very similar rules about walking a list concurrently > > with somebody adding an entry to it. > > > > And the RCU list walkers do end up doing special things to load the > > next pointer. > > > > Whether we want to say - and document - that "ok, the RCU list walkers > > are also useful for this non-RCU use case" in general might be worth > > discussing. This might work quite well, given the lockdep annotations that Joel added to list_for_each_entry_rcu(). For example, lockdep_assert_held() would tell it that it didn't need to be in an RCU reader as long as the lock passed to lockdep_assert_held() was held at that point. > > It may be that the i915 use case is so special that it's only worth > > documenting there. > > If documenting is the right approach, KCSAN's data_race() could be > thought of as KCSAN-visible documentation. > > I will touch base with Chris separately. Of these options, what looks best to you? Or would something else work better? Thanx, Paul