[RFC PATCH 04/11] ext4: data=journal: introduce helpers for journalled writeback deadlock

Mauricio Faria de Oliveira <mfo@xxxxxxxxxxxxx> · Thu, 23 Apr 2020 20:36:58 -0300

This patch introduces the helper functions ext4_check_journalled_writeback(),
and ext4_start_commit_datasync(), to check for, and prevent the deadlock #2
(detailed in a previous commit.)

The former checks the transaction associated with the handle (parameter)
is also the transaction stored in the inode for datasync'ing operations
(set by __ext4_journalled_writepage()) if the page (parameter) is under
writeback (set by that function too.)

This patch also documents the steps to prevent the deadlock, if needed
(i.e., helper function returns true) which consist in a retry strategy,
using the latter helper.

The check may return false positives: i_datasync_tid and PageWriteback
are set by other functions than __ext4_journalled_writepage(); but not
false negatives.

So the code may unnecessarily stop/commit/start on false-positives,
but it does prevent deadlocks so it's reasonable cost-benefit case.

Signed-off-by: Mauricio Faria de Oliveira <mfo@xxxxxxxxxxxxx>
---
 fs/ext4/ext4_jbd2.h | 72 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index 9ea8ee583931..fca6551dbf09 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -225,6 +225,78 @@ struct ext4_journalled_wb_page {
 
 extern struct kmem_cache *ext4_journalled_wb_page_cachep;
 
+/*
+ * ext4_check_journalled_writeback(handle, page).
+ * See __ext4_journalled_writepage().
+ *
+ * This function can be used to check for a potential deadlock if this task has
+ * a handle on a transaction and has to wait_on_page_writeback() when it's held.
+ * (NOTE: this affects grab_cache_page_write_begin() after ext4_journal_start())
+ *
+ * The deadlock occurs if another task has set_page_writeback() for the _same_
+ * page on the _same_ transaction, and this task calls wait_on_page_writeback().
+ *
+ * The held handle blocks the transaction commit, and thus end_page_writeback(),
+ * blocking this task in wait_on_page_writeback(); and only ext4_journal_stop()
+ * could unblock the commit, but it is not reached.
+ *
+ * If the function returns true, prevent the deadlock:
+ *
+ * 1) Stop handle to safely wait_on_page_writeback()
+ * 2) Start commiting the transaction (non-blocking)
+ *    saved in inode by __ext4_journalled_writepage()
+ * 3) Call wait_on_page_writeback()
+ * 4) Retry
+ *
+ * For example,
+ *
+ * retry:
+ *         // done before ext4_journal_start()
+ *         page = grab_cache_page_write_begin(mapping, ...);
+ *         if (ext4_should_journal_data(inode))
+ *                 wait_on_page_writeback(page);
+ *         unlock_page(page);
+ *
+ *         handle = ext4_journal_start(inode, ...);
+ *
+ *         lock_page(page);
+ *
+ *         // new code
+ *         if (ext4_check_journalled_writeback(handle, page) {
+ *                 unlock_page(page);
+ *                 put_page(page);
+ *                 ext4_journal_stop(handle);
+ *                 ext4_start_commit_datasync(inode);
+ *                 goto retry;
+ *         }
+ *
+ * Unfortunately the check may return false positives (e.g., non-mmaped/buffered
+ * pagecache that is under writeback that turned out to have same i_datasync_tid)
+ * and thus stop/commit/start unnecessarily.  But since it can prevent deadlocks
+ * and only affects the data=journal mode, it seems a reasonable cost/benefit.
+ */
+static inline int ext4_should_journal_data(struct inode *inode);
+
+static inline bool ext4_check_journalled_writeback(handle_t *handle,
+						   struct page *page)
+{
+	struct inode *inode = page->mapping->host;
+
+	BUG_ON(!ext4_should_journal_data(inode));
+
+	return (PageWriteback(page) &&
+		handle->h_transaction->t_tid == EXT4_I(inode)->i_datasync_tid);
+}
+
+static inline int ext4_start_commit_datasync(struct inode *inode)
+{
+	BUG_ON(!ext4_should_journal_data(inode));
+
+	/* Start commit associated with datasync transaction (non-blocking.) */
+	return jbd2_log_start_commit(EXT4_JOURNAL(inode),
+				     EXT4_I(inode)->i_datasync_tid);
+}
+
 int
 ext4_mark_iloc_dirty(handle_t *handle,
 		     struct inode *inode,
-- 
2.20.1