Re: [PATCH RFC v6 00/21] DEPT(Dependency Tracker)

Byungchul Park <byungchul.park@xxxxxxx> · Wed, 11 May 2022 08:39:29 +0900

On Tue, May 10, 2022 at 08:18:12PM +0900, Hyeonggon Yoo wrote:
> On Mon, May 09, 2022 at 09:16:37AM +0900, Byungchul Park wrote:
> > On Sat, May 07, 2022 at 04:20:50PM +0900, Hyeonggon Yoo wrote:
> > > On Fri, May 06, 2022 at 09:11:35AM +0900, Byungchul Park wrote:
> > > > Linus wrote:
> > > > >
> > > > > On Wed, May 4, 2022 at 1:19 AM Byungchul Park <byungchul.park@xxxxxxx> wrote:
> > > > > >
> > > > > > Hi Linus and folks,
> > > > > >
> > > > > > I've been developing a tool for detecting deadlock possibilities by
> > > > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > > > cover all synchonization machanisms.
> > > > > 
> > > > > So what is the actual status of reports these days?
> > > > > 
> > > > > Last time I looked at some reports, it gave a lot of false positives
> > > > > due to mis-understanding prepare_to_sleep().
> > > > 
> > > > Yes, it was. I handled the case in the following way:
> > > > 
> > > > 1. Stage the wait at prepare_to_sleep(), which might be used at commit.
> > > >    Which has yet to be an actual wait that Dept considers.
> > > > 2. If the condition for sleep is true, the wait will be committed at
> > > >    __schedule(). The wait becomes an actual one that Dept considers.
> > > > 3. If the condition is false and the task gets back to TASK_RUNNING,
> > > >    clean(=reset) the staged wait.
> > > > 
> > > > That way, Dept only works with what actually hits to __schedule() for
> > > > the waits through sleep.
> > > > 
> > > > > For this all to make sense, it would need to not have false positives
> > > > > (or at least a very small number of them together with a way to sanely
> > > > 
> > > > Yes. I agree with you. I got rid of them that way I described above.
> > > >
> > > 
> > > IMHO DEPT should not report what lockdep allows (Not talking about
> > 
> > No.
> > 
> > > wait events). I mean lockdep allows some kind of nested locks but
> > > DEPT reports them.
> > 
> > You have already asked exactly same question in another thread of
> > LKML. That time I answered to it but let me explain it again.
> > 
> > ---
> > 
> > CASE 1.
> > 
> >    lock L with depth n
> >    lock_nested L' with depth n + 1
> >    ...
> >    unlock L'
> >    unlock L
> > 
> > This case is allowed by Lockdep.
> > This case is allowed by DEPT cuz it's not a deadlock.
> > 
> > CASE 2.
> > 
> >    lock L with depth n
> >    lock A
> >    lock_nested L' with depth n + 1
> >    ...
> >    unlock L'
> >    unlock A
> >    unlock L
> > 
> > This case is allowed by Lockdep.
> > This case is *NOT* allowed by DEPT cuz it's a *DEADLOCK*.
> >
> 
> Yeah, in previous threads we discussed this [1]
> 
> And the case was:
> 	scan_mutex -> object_lock -> kmemleak_lock -> object_lock
> And dept reported:
> 	object_lock -> kmemleak_lock, kmemleak_lock -> object_lock as
> 	deadlock.
> 
> But IIUC - What DEPT reported happens only under scan_mutex and
> It is not simple just not to take them because the object can be removed from the
> list and freed while scanning via kmemleak_free() without kmemleak_lock and object_lock.

That should be one of the following order:

1. kmemleak_lock -> object_lock -> object_lock(nested)
2. object_lock -> object_lock(nested) -> kmemleak_lock

> Just I'm still not sure that someone will fix the warning in the future - even if the
> locking rule is not good - if it will not cause a real deadlock.

There's more important thing than making code just work for now. For
example, maintainance, communcation via code between current developers
and potential new commers in the future and so on.

At least, a comment describing why the wrong order in the code is safe
should be added. I wouldn't allow the current order in the code if I
were the maintainer.

	Byungchul

> > ---
> > 
> > The following scenario would explain why CASE 2 is problematic.
> > 
> >    THREAD X			THREAD Y
> > 
> >    lock L with depth n
> > 				lock L' with depth n
> >    lock A
> > 				lock A
> >    lock_nested L' with depth n + 1
> > 				lock_nested L'' with depth n + 1
> >    ...				...
> >    unlock L'			unlock L''
> >    unlock A			unlock A
> >    unlock L			unlock L'
> > 
> > Yes. I need to check if the report you shared with me is a true one, but
> > it's not because DEPT doesn't work with *_nested() APIs.
> >
> 
> Sorry, It was not right just to say DEPT doesn't work with _nested() APIs.
> 
> > 	Byungchul
> 
> [1] https://lore.kernel.org/lkml/20220304002809.GA6112@X58A-UD3R/
> 
> -- 
> Thanks,
> Hyeonggon