On Tue, Feb 27, 2024 at 04:25:46PM -0500, Phillip Susi wrote: > I noticed that every time I sync ( which happens automatically when > you suspend to ram ), ext4 issues a flush to the block device, even > though there have been no writes to flush. This appears to be because > jbd2_trans_will_send_data_barrier() returns a 0 when no transaction > has been started. The intent appears to be that a transaction that > has completed should return 0, and that when there is NO transaction, > it should return a 1, but the tests were in the wrong order, leading > to the 0 to be returned before checking for the absense of a > transaction at all. Reversing the order allows my disk to remain in > runtime_pm when syncing. > > I *think* this is correct, but I'm not very familliar with jbd2, so it > may have unintended consequences. What do you think? Yeah, this change is going to problems. The basic idea here is if when we request that a transaction to commit, will it issue a a commit? If so, then fsync(2) doesn't need to issue a barrier (i.e., a cache flush command). So for example, if a database does an overwriting write to a file block which is already allocated, and then follows it up with a fdatasync(2), there won't be any need to make any metadata changes as part of writing out the changed block. Hence, we won't need to start a new jbd2 transaction, and in that case, current transaction has already commited, so the jbd2 layer won't need to do anything, and so it won't have issued a commit. In that case, jbd2_trans_will_send_data_barrier() needs to return false, so that fdatasync(2) will actually issue a cache flush command. The patch you've proposed will cause fdatasync(2) to not issue a barrier, which could lead to the write to the database file getting lost after a power fail event, which would make the database adminisrtator very sad. Cheers, - Ted