On Fri, Dec 29, 2017 at 04:28:51PM +0900, Byungchul Park wrote: > On Thu, Dec 28, 2017 at 10:51:46PM -0500, Theodore Ts'o wrote: > > On Fri, Dec 29, 2017 at 10:47:36AM +0900, Byungchul Park wrote: > > > > > > (1) The best way: To classify all waiters correctly. > > > > It's really not all waiters, but all *locks*, no? > > Thanks for your opinion. I will add my opinion on you. > > I meant *waiters*. Locks are only a sub set of potential waiters, which > actually cause deadlocks. Cross-release was designed to consider the > super set including all general waiters such as typical locks, > wait_for_completion(), and lock_page() and so on.. I think this is a terminology problem. To me (and, I suspect Ted), a waiter is a subject of a verb while a lock is an object. So Ted is asking whether we have to classify the users, while I think you're saying we have extra objects to classify. I'd be comfortable continuing to refer to completions as locks. We could try to come up with a new object name like waitpoints though? > > In addition, the lock classification system is not documented at all, > > so now you also need someone who understands the lockdep code. And > > since some of these classifications involve transient objects, and > > lockdep doesn't have a way of dealing with transient locks, and has a > > hard compile time limit of the number of locks that it supports, to > > expect a subsystem maintainer to figure out all of the interactions, > > plus figure out lockdep, and work around lockdep's limitations > > seems.... not realistic. > > I have to think it more to find out how to solve it simply enough to be > acceptable. The only solution I come up with for now is too complex. I want to amplify Ted's point here. How to use the existing lockdep functionality is undocumented. And that's not your fault. We have Documentation/locking/lockdep-design.txt which I'm sure is great for someone who's willing to invest a week understanding it, but we need a "here's how to use it" guide. > > Given that once Lockdep reports a locking violation, it doesn't report > > any more lockdep violations, if there are a large number of false > > positives, people will not want to turn on cross-release, since it > > will report the false positive and then turn itself off, so it won't > > report anything useful. So if no one turns it on because of the false > > positives, how does the bitrot problem get resolved? > > The problems come from wrong classification. Waiters either classfied > well or invalidated properly won't bitrot. I disagree here. As Ted says, it's the interactions between the subsystems that leads to problems. Everything's goig to work great until somebody does something in a way that's never been tried before.