[RFC PATCH 09/11] ext4: grab page before starting transaction handle in ext4_try_to_write_inline_data()

Mauricio Faria de Oliveira <mfo@xxxxxxxxxxxxx> · Thu, 23 Apr 2020 20:37:03 -0300

Since even just grab_cache_page_write_begin() might deadlock with
page writeback from __ext4_journalled_writepage() if stable pages
are required (as it does "lock_page(); wait_for_stable_page();")
and the handle to the same/running transaction is currently held,
it _cannot_ be called between ext4_journal_start/stop() as usual,
we have to be careful.

We have two options:

1) open-code the first part of grab_cache_page_write_begin()
   (before wait_for_stable_pages()) just before calling it,
   then check for deadlock and retry (a bit ugly but valid.)

2) move it before ext4_journal_start() to avoid the deadlock.

Option 2 has been done as optimization to ext4_write_begin()
in commit 47564bfb95bf ("ext4: grab page before starting
transaction handle in write_begin()"), and can similarly
apply to this case.

Since it sounds more official, counts as optimization that
prevents long transaction hold time, and isn't ugly, do it.

Signed-off-by: Mauricio Faria de Oliveira <mfo@xxxxxxxxxxxxx>
---
 fs/ext4/inline.c | 46 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 37 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 10665b066bb7..9bb85e06854d 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -692,37 +692,65 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
 	if (ret)
 		return ret;
 
+	/*
+	 * grab_cache_page_write_begin() can take a long time if the
+	 * system is thrashing due to memory pressure, or if the page
+	 * is being written back.  So grab it first before we start
+	 * the transaction handle.  This also allows us to allocate
+	 * the page (if needed) without using GFP_NOFS.
+	 */
+retry_grab:
+	page = grab_cache_page_write_begin(mapping, 0, flags);
+	if (!page) {
+		ret = -ENOMEM;
+		handle = NULL;
+		goto out;
+	}
+	unlock_page(page);
+
 	/*
 	 * The possible write could happen in the inode,
 	 * so try to reserve the space in inode first.
 	 */
 	handle = ext4_journal_start(inode, EXT4_HT_INODE, 1);
 	if (IS_ERR(handle)) {
+		put_page(page);
 		ret = PTR_ERR(handle);
 		handle = NULL;
 		goto out;
 	}
 
+	lock_page(page);
+	if (page->mapping != mapping) {
+		/* The page got truncated from under us */
+		unlock_page(page);
+		put_page(page);
+		ext4_journal_stop(handle);
+		goto retry_grab;
+	}
+	/* In case writeback began while the page was unlocked */
+	wait_for_stable_page(page);
+
 	ret = ext4_prepare_inline_data(handle, inode, pos + len);
-	if (ret && ret != -ENOSPC)
+	if (ret && ret != -ENOSPC) {
+		unlock_page(page);
+		put_page(page);
 		goto out;
+	}
 
 	/* We don't have space in inline inode, so convert it to extent. */
 	if (ret == -ENOSPC) {
+		unlock_page(page);
+		put_page(page);
 		ext4_journal_stop(handle);
 		brelse(iloc.bh);
 		goto convert;
 	}
 
 	ret = ext4_journal_get_write_access(handle, iloc.bh);
-	if (ret)
-		goto out;
-
-	flags |= AOP_FLAG_NOFS;
-
-	page = grab_cache_page_write_begin(mapping, 0, flags);
-	if (!page) {
-		ret = -ENOMEM;
+	if (ret) {
+		unlock_page(page);
+		put_page(page);
 		goto out;
 	}
 
-- 
2.20.1