On Mon, Feb 28, 2022 at 11:14:44AM +0100, Jan Kara wrote: > > case 1. Code with an actual circular dependency, but not deadlock. > > > > A circular dependency can be broken by a rescue wakeup source e.g. > > timeout. It's not a deadlock. If it's okay that the contexts > > participating in the circular dependency and others waiting for the > > events in the circle are stuck until it gets broken. Otherwise, say, > > if it's not meant, then it's anyway problematic. > > > > 1-1. What if we judge this code is problematic? > > 1-2. What if we judge this code is good? > > > > I've been wondering if the kernel guys esp. Linus considers code with > > any circular dependency is problematic or not, even if it won't lead to > > a deadlock, say, case 1. Even though I designed Dept based on what I > > believe is right, of course, I'm willing to change the design according > > to the majority opinion. > > > > However, I would never allow case 1 if I were the owner of the kernel > > for better stability, even though the code works anyway okay for now. Note, I used the example of the timeout as the most obvious way of explaining that a deadlock is not possible. There is also the much more complex explanation which Jan was trying to give, which is what leads to the circular dependency. It can happen that when trying to start a handle, if either (a) there is not enough space in the journal for new handles, or (b) the current transaction is so large that if we don't close the transaction and start a new hone, we will end up running out of space in the future, and so in that case, start_this_handle() will block starting any more handles, and then wake up the commit thread. The commit thread then waits for the currently running threads to complete, before it allows new handles to start, and then it will complete the commit. In the case of (a) we then need to do a journal checkpoint, which is more work to release space in the journal, and only then, can we allow new handles to start. The botom line is (a) it works, (b) there aren't significant delays, and for DEPT to complain that this is somehow wrong and we need to completely rearchitect perfectly working code because it doesn't confirm to DEPT's idea of what is "correct" is not acceptable. > We have a queue of work to do Q protected by lock L. Consumer process has > code like: > > while (1) { > lock L > prepare_to_wait(work_queued); > if (no work) { > unlock L > sleep > } else { > unlock L > do work > wake_up(work_done) > } > } > > AFAIU Dept will create dependency here that 'wakeup work_done' is after > 'wait for work_queued'. Producer has code like: > > while (1) { > lock L > prepare_to_wait(work_done) > if (too much work queued) { > unlock L > sleep > } else { > queue work > unlock L > wake_up(work_queued) > } > } > > And Dept will create dependency here that 'wakeup work_queued' is after > 'wait for work_done'. And thus we have a trivial cycle in the dependencies > despite the code being perfectly valid and safe. Cheers, - Ted