On Fri, Dec 29, 2017 at 10:16:24PM -0800, Matthew Wilcox wrote: > > I think this is a terminology problem. To me (and, I suspect Ted), a > waiter is a subject of a verb while a lock is an object. So Ted is asking > whether we have to classify the users, while I think you're saying we > have extra objects to classify. Exactly, the classification is applied when the {lock, mutex, completion} object is initialized. Not currently at the individual call points to mutex_lock(), wait_for_completion(), down_write(), etc. > > The problems come from wrong classification. Waiters either classfied > > well or invalidated properly won't bitrot. > > I disagree here. As Ted says, it's the interactions between the > subsystems that leads to problems. Everything's goig to work great > until somebody does something in a way that's never been tried before. The question what is classified *well* mean? At the extreme, we could put the locks for every single TCP connection into their own lockdep class. But that would blow the limits in terms of the number of locks out of the water super-quickly --- and it would destroy the ability for lockdep to learn what the proper locking order should be. Yet given Lockdep's current implementation, the only way to guarantee that there won't be any interactions between subsystems that cause false positives would be to categorizes locks for each TCP connection into their own class. So this is why I get a little annoyed when you say, "it's just a matter of classification". NO IT IS NOT. We can not possibly classify things "correctly" to completely limit false positives without completely destroying lockdep's scalability as it is currently designed. Byungchul, you don't acknowledge this, and it makes the "just classify everything" argument completely suspect as a result. As far as the "just invalidate the waiter", the problem is that it requires source level changes to invalidate the waiter, and for different use cases, we will need to validate different waiters. For example, in the example I gave, we would have to invalidate *all* TCP waiters/locks in order to prevent false positives. But that makes the lockdep useless for all TCP locks. What's the solution? I claim that until lockdep is fundamentally fixed, there is no way to eliminate *all* false positives without invalidating *all* cross-release/cross-locks --- in which case you might as well leave the cross-release patches as an out of tree patch. So to claim that we can somehow fix the problem by making source-level changes outside of lockdep, by "properly classifying" or "properly invalidating" all locks, just doesn't make sense. The only way it can work is to either dump it on the reposibility of the people debugging lockdep reports to make source level changes to other subsystems which they aren't the maintainers of to suppress false positives that arise due to how the subsystems are being used together in their particular configuration ---- or you can try to claim that there is an "acceptable level" of false positives with which we can live with forever, and which can not be fixed by "proper classifying" the locks. Or you can try to make lockdep scalable enough that if we could put every single lock for every single object into its own lock class (e.g., each lock for every single TCP connection gets its own lock class) which is after all the only way we can "properly classify everything") and still let lockdep be useful. If you think that is doable, why don't you work on that, and once that is done, maybe cross-locks lockdep will be considered more acceptable for mainstream? - Ted