On Wed, Apr 08, 2020 at 02:14:38PM -0700, Linus Torvalds wrote: > On Wed, Apr 8, 2020 at 1:26 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > OK, it sounds like we need to specify what needs to be present in this > > sort of commit log. > > That would have helped. > > But in this case, after having looked more at it, it wasn't that the > commit log was not sufficient, it was that the change was wrong. > > With a better commit log and a pointer to the KCSAN report, I would > have been able to make a better judgment call earlier, and not have > had to ask for more information. > > But once I figured out what the problem was, it was clear that > > (a) what the i915 driver does is at a minimum questionable, and quite > likely actively buggy > > (b) even if it's not buggy in the i915 driver, changing the list > traversal macros to use READ_ONCE() would hide other places where it > most definitely is buggy. Yeah, and I should especially have spotted this last one, given how many times I have run into it while using KCSAN within RCU. ;-/ In any case, we have review comments, discussion, and the occasional controvery for non-KCSAN bug fixes, so it is only reasonable to expect a similar process for KCSAN bug reports, right? My main goal here is to make things easier for people reviewing future KCSAN-inspired fixes. > > o The KCSAN splat, optionally including file/line numbers. > > I do think that the KCSAN splat - very preferably simplified and with > explanations - should basically always be included if KCSAN was the > reason. > > Of course, the "simplified and with explanations" may be simplified to > the point where none of the original splat even remains. Those splats > aren't so legible that anybody should feel like they should be > included. > > Put another way: an analysis that is so thorough that it makes the raw > splat data pointless is a *good* thing, and if that means that no > splat is needed, all the better. Fair point. The KCSAN splat has the added advantage of immediately answering the "but does this really happen?" question, but then again that question could also be answered by just stating that the issue was found using KCSAN. > > o Detailed description of the problem this splat identifies, for > > example, the code might fail to acquire a necessary lock, a plain > > load might vulnerable to compiler optimizations, and so on. > > See above: if the description is detailed enough, then the splat > itself becomes much less interesting. > > But in this case, for example, it's worth pointing out that the "safe" > list accessors are used all over the kernel, and they are very much > _not_ thread-safe. The "safe" in them is purely about the current > thread removing an entry, not some kind of general thread-safeness. > > Which is why I decided I really hated that patch - it basically would > make KCSAN unable to find _real_ races elsewhere, because it hid a > very special race in the i915 driver. > > So I started out with the "that can't be right" kind of general > feeling of uneasiness about the patch. I ended up with a much more > specific "no, that's very much not right" and no amount of commit log > should have saved it. > > As mentioned, I suspect that the i915 driver could actually use the > RCU-safe list walkers. Their particular use is not about RCU per se, > but it has some very similar rules about walking a list concurrently > with somebody adding an entry to it. > > And the RCU list walkers do end up doing special things to load the > next pointer. > > Whether we want to say - and document - that "ok, the RCU list walkers > are also useful for this non-RCU use case" in general might be worth > discussing. > > It may be that the i915 use case is so special that it's only worth > documenting there. If documenting is the right approach, KCSAN's data_race() could be thought of as KCSAN-visible documentation. I will touch base with Chris separately. Thanx, Paul