On Mon, Feb 28, 2022 at 04:25:04PM -0500, Theodore Ts'o wrote: > On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote: > > > case 1. Code with an actual circular dependency, but not deadlock. > > > > > > A circular dependency can be broken by a rescue wakeup source e.g. > > > timeout. It's not a deadlock. If it's okay that the contexts > > > participating in the circular dependency and others waiting for the > > > events in the circle are stuck until it gets broken. Otherwise, say, > > > if it's not meant, then it's anyway problematic. > > > > > > 1-1. What if we judge this code is problematic? > > > 1-2. What if we judge this code is good? > > > > > > I've been wondering if the kernel guys esp. Linus considers code with > > > any circular dependency is problematic or not, even if it won't lead to > > > a deadlock, say, case 1. Even though I designed Dept based on what I > > > believe is right, of course, I'm willing to change the design according > > > to the majority opinion. > > > > > > However, I would never allow case 1 if I were the owner of the kernel > > > for better stability, even though the code works anyway okay for now. > > Note, I used the example of the timeout as the most obvious way of > explaining that a deadlock is not possible. There is also the much > more complex explanation which Jan was trying to give, which is what > leads to the circular dependency. It can happen that when trying to > start a handle, if either (a) there is not enough space in the journal > for new handles, or (b) the current transaction is so large that if we > don't close the transaction and start a new hone, we will end up > running out of space in the future, and so in that case, > start_this_handle() will block starting any more handles, and then > wake up the commit thread. The commit thread then waits for the > currently running threads to complete, before it allows new handles to > start, and then it will complete the commit. In the case of (a) we > then need to do a journal checkpoint, which is more work to release > space in the journal, and only then, can we allow new handles to start. Thank you for the full explanation of how journal things work. > The botom line is (a) it works, (b) there aren't significant delays, > and for DEPT to complain that this is somehow wrong and we need to > completely rearchitect perfectly working code because it doesn't > confirm to DEPT's idea of what is "correct" is not acceptable. Thanks to you and Jan Kara, I realized it's not a real dependency in the consumer and producer scenario but again *ONLY IF* there is a rescue wakeup source. Dept should track the rescue wakeup source instead in the case. I won't ask you to rearchitect the working code. The code looks sane. Thanks a lot. Thanks, Byungchul > > We have a queue of work to do Q protected by lock L. Consumer process has > > code like: > > > > while (1) { > > lock L > > prepare_to_wait(work_queued); > > if (no work) { > > unlock L > > sleep > > } else { > > unlock L > > do work > > wake_up(work_done) > > } > > } > > > > AFAIU Dept will create dependency here that 'wakeup work_done' is after > > 'wait for work_queued'. Producer has code like: > > > > while (1) { > > lock L > > prepare_to_wait(work_done) > > if (too much work queued) { > > unlock L > > sleep > > } else { > > queue work > > unlock L > > wake_up(work_queued) > > } > > } > > > > And Dept will create dependency here that 'wakeup work_queued' is after > > 'wait for work_done'. And thus we have a trivial cycle in the dependencies > > despite the code being perfectly valid and safe. > > Cheers, > > - Ted