Re: [PATCH] kunit: added lockdep support

peterz@xxxxxxxxxxxxx · Tue, 11 Aug 2020 21:05:17 +0200

On Tue, Aug 11, 2020 at 12:03:51PM -0500, Uriel Guajardo wrote:
> On Mon, Aug 10, 2020 at 4:43 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Mon, Aug 10, 2020 at 09:32:57PM +0000, Uriel Guajardo wrote:
> > > +static inline void kunit_check_locking_bugs(struct kunit *test,
> > > +                                         unsigned long saved_preempt_count)
> > > +{
> > > +     preempt_count_set(saved_preempt_count);
> > > +#ifdef CONFIG_TRACE_IRQFLAGS
> > > +     if (softirq_count())
> > > +             current->softirqs_enabled = 0;
> > > +     else
> > > +             current->softirqs_enabled = 1;
> > > +#endif
> > > +#if IS_ENABLED(CONFIG_LOCKDEP)
> > > +     local_irq_disable();
> > > +     if (!debug_locks) {
> > > +             kunit_set_failure(test);
> > > +             lockdep_reset();
> > > +     }
> > > +     local_irq_enable();
> > > +#endif
> > > +}
> >
> > Unless you can guarantee this runs before SMP brinup, that
> > lockdep_reset() is terminally broken.
> 
> Good point. KUnit is initialized after SMP is set up, and KUnit can
> also be built as a module, so it's not a guarantee that we can make.

Even if you could, there's still the question of wether throwing out all
the dependencies learned during boot is a sensible idea.

> Is there any other way to turn lockdep back on after we detect a
> failure? It would be ideal if lockdep could still run in the next test
> case after a failure in a previous one.

Not really; the moment lockdep reports a failure it turns off all
tracking and we instantly loose state.

You'd have to:

 - delete the 'mistaken' dependency from the graph such that we loose
   the cycle, otherwise it will continue to find and report the cycle.

 - put every task through a known empty state which turns the tracking
   back on.

Bart implemented most of what you need for the first item last year or
so, but the remaining bit and the second item would still be a fair
amount of work.

Also, I'm really not sure it's worth it, the kernel should be free of
lock cycles, so just fix one, reboot and continue.

> I suppose we could only display the first failure that occurs, similar
> to how lockdep does it. But it could also be useful to developers if
> they saw failures in subsequent test cases, with the knowledge that
> those failures may be unreliable.

People already struggle with lockdep reports enough; I really don't want
to given them dodgy report to worry about.