On Thu, Mar 03, 2022 at 10:00:33AM +0900, Byungchul Park wrote: > > Unfortunately, it's neither perfect nor safe without another wakeup > source - rescue wakeup source. > > consumer producer > > lock L > (too much work queued == true) > unlock L > --- preempted > lock L > unlock L > do work > lock L > unlock L > do work > ... > (no work == true) > sleep > --- scheduled in > sleep That's not how things work in ext4. It's **way** more complicated than that. We have multiple wait channels, one wake up the consumer (read: the commit thread), and one which wakes up any processes waiting for commit thread to have made forward progress. We also have two spin-lock protected sequence number, one which indicates the current commited transaction #, and one indicating the transaction # that needs to be committed. On the commit thread, it will sleep on j_wait_commit, and when it is woken up, it will check to see if there is work to be done (j_commit_sequence != j_commit_request), and if so, do the work, and then wake up processes waiting on the wait_queue j_wait_done_commit. (Again, all of this uses the pattern, "prepare to wait", then check to see if we should sleep, if we do need to sleep, unlock j_state_lock, then sleep. So this prevents any races leading to lost wakeups. On the start_this_handle() thread, if we current transaction is too full, we set j_commit_request to its transaction id to indicate that we want the current transaction to be committed, and then we wake up the j_wait_commit wait queue and then we enter a loop where do a prepare_to_wait in j_wait_done_commit, check to see if j_commit_sequence == the transaction id that we want to be completed, and if it's not done yet, we unlock the j_state_lock spinlock, and go to sleep. Again, because of the prepare_to_wait, there is no chance of a lost wakeup. So there really is no "consumer" and "producer" here. If you really insist on using this model, which really doesn't apply, for one thread, it's the consumer with respect to one wait queue, and the producer with respect to the *other* wait queue. For the other thread, the consumer and producer roles are reversed. And of course, this is a highly simplified model, since we also have a wait queue used by the commit thread to wait for the number of active handles on a particular transaction to go to zero, and stop_this_handle() will wake up commit thread via this wait queue when the last active handle on a particular transaction is retired. (And yes, that parameter is also protected by a different spin lock which is per-transaction). So it seems to me that a fundamental flaw in DEPT's model is assuming that the only waiting paradigm that can be used is consumer/producer, and that's simply not true. The fact that you use the term "lock" is also going to lead a misleading line of reasoning, because properly speaking, they aren't really locks. We are simply using wait channels to wake up processes as necessary, and then they will check other variables to decide whether or not they need to sleep or not, and we have an invariant that when these variables change indicating forward progress, the associated wait channel will be woken up. Cheers, - Ted P.S. This model is also highly simplified since there are other reasons why the commit thread can be woken up, some which might be via a timeout, and some which is via the j_wait_commit wait channel but not because j_commit_request has been changed, but because file system is being unmounted, or the file system is being frozen in preparation of a snapshot, etc. These are *not* necessary to prevent a deadlock, because under normal circumstances the two wake channels are sufficient of themselves. So please don't think of them as "rescue wakeup sources"; again, that's highly misleading and the wrong way to think of them. And to make things even more complicated, we have more than 2 wait channel --- we have *five*: /** * @j_wait_transaction_locked: * * Wait queue for waiting for a locked transaction to start committing, * or for a barrier lock to be released. */ wait_queue_head_t j_wait_transaction_locked; /** * @j_wait_done_commit: Wait queue for waiting for commit to complete. */ wait_queue_head_t j_wait_done_commit; /** * @j_wait_commit: Wait queue to trigger commit. */ wait_queue_head_t j_wait_commit; /** * @j_wait_updates: Wait queue to wait for updates to complete. */ wait_queue_head_t j_wait_updates; /** * @j_wait_reserved: * * Wait queue to wait for reserved buffer credits to drop. */ wait_queue_head_t j_wait_reserved; /** * @j_fc_wait: * * Wait queue to wait for completion of async fast commits. */ wait_queue_head_t j_fc_wait; "There are more things in heaven and Earth, Horatio, Than are dreamt of in your philosophy." - William Shakespeare, Hamlet