Re: [patch 125/166] lib/list: prevent compiler reloads inside 'safe' list iteration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 08, 2020 at 03:37:57PM -0700, Paul E. McKenney wrote:
> On Wed, Apr 08, 2020 at 02:14:38PM -0700, Linus Torvalds wrote:
> > On Wed, Apr 8, 2020 at 1:26 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > >
> > > OK, it sounds like we need to specify what needs to be present in this
> > > sort of commit log.
> > 
> > That would have helped.
> > 
> > But in this case, after having looked more at it, it wasn't that the
> > commit log was not sufficient, it was that the change was wrong.
> > 
> > With a better commit log and a pointer to the KCSAN report, I would
> > have been able to make a better judgment call earlier, and not have
> > had to ask for more information.
> > 
> > But once I figured out what the problem was, it was clear that
> > 
> >  (a) what the i915 driver does is at a minimum questionable, and quite
> > likely actively buggy
> > 
> >  (b) even if it's not buggy in the i915 driver, changing the list
> > traversal macros to use READ_ONCE() would hide other places where it
> > most definitely is buggy.
> 
> Yeah, and I should especially have spotted this last one, given how many
> times I have run into it while using KCSAN within RCU.  ;-/
> 
> In any case, we have review comments, discussion, and the occasional
> controvery for non-KCSAN bug fixes, so it is only reasonable to expect
> a similar process for KCSAN bug reports, right?  My main goal here is
> to make things easier for people reviewing future KCSAN-inspired fixes.
> 
> > > o       The KCSAN splat, optionally including file/line numbers.
> > 
> > I do think that the KCSAN splat - very preferably simplified and with
> > explanations - should basically always be included if KCSAN was the
> > reason.
> > 
> > Of course, the "simplified and with explanations" may be simplified to
> > the point where none of the original splat even remains. Those splats
> > aren't so legible that anybody should feel like they should be
> > included.
> > 
> > Put another way: an analysis that is so thorough that it makes the raw
> > splat data pointless is a *good* thing, and if that means that no
> > splat is needed, all the better.
> 
> Fair point.  The KCSAN splat has the added advantage of immediately
> answering the "but does this really happen?" question, but then again
> that question could also be answered by just stating that the issue was
> found using KCSAN.
> 
> > > o       Detailed description of the problem this splat identifies, for
> > >         example, the code might fail to acquire a necessary lock, a plain
> > >         load might vulnerable to compiler optimizations, and so on.
> > 
> > See above: if the description is detailed enough, then the splat
> > itself becomes much less interesting.
> > 
> > But in this case, for example, it's worth pointing out that the "safe"
> > list accessors are used all over the kernel, and they are very much
> > _not_ thread-safe. The "safe" in them is purely about the current
> > thread removing an entry, not some kind of general thread-safeness.
> > 
> > Which is why I decided I really hated that patch - it basically would
> > make KCSAN unable to find _real_ races elsewhere, because it hid a
> > very special race in the i915 driver.
> > 
> > So I started out with the "that can't be right" kind of general
> > feeling of uneasiness about the patch. I ended up with a much more
> > specific "no, that's very much not right" and no amount of commit log
> > should have saved it.
> > 
> > As mentioned, I suspect that the i915 driver could actually use the
> > RCU-safe list walkers. Their particular use is not about RCU per se,
> > but it has some very similar rules about walking a list concurrently
> > with somebody adding an entry to it.
> > 
> > And the RCU list walkers do end up doing special things to load the
> > next pointer.
> > 
> > Whether we want to say - and document - that "ok, the RCU list walkers
> > are also useful for this non-RCU use case" in general might be worth
> > discussing.

This might work quite well, given the lockdep annotations that Joel
added to list_for_each_entry_rcu().  For example, lockdep_assert_held()
would tell it that it didn't need to be in an RCU reader as long as the
lock passed to lockdep_assert_held() was held at that point.

> > It may be that the i915 use case is so special that it's only worth
> > documenting there.
> 
> If documenting is the right approach, KCSAN's data_race() could be
> thought of as KCSAN-visible documentation.
> 
> I will touch base with Chris separately.

Of these options, what looks best to you?  Or would something else work
better?

							Thanx, Paul



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux