Vlastimil Babka 於 2020/7/14 下午5:46 寫道:
On 7/13/20 3:57 AM, Robbie Ko wrote:
Vlastimil Babka 於 2020/7/10 下午11:31 寫道:
On 7/9/20 4:48 AM, robbieko wrote:
From: Robbie Ko <robbieko@xxxxxxxxxxxx>
When a migrate page occurs, we first create a migration entry
to replace the original pte, and then go to fallback_migrate_page
to execute a writeout if the migratepage is not supported.
In the writeout, we will clear the dirty bit of the page and use
page_mkclean to clear the dirty bit along with the corresponding pte,
but page_mkclean does not support migration entry.
The page ditry bit is cleared, but the dirty bit of the pte still exists,
so if mmap continues to write, it will result in data loss.
Curious, did you observe this data loss? What filesystem? If yes, it seems
serious enough to
CC stable and determine a Fixes: tag?
Yes, there is data loss.
I'm using a btrfs environment, but not the following patch
And the kernel is otherwise upstream? Which version?
Anyway we better let btrfs guys know (+CC) even if the fix is in MM code.
Kernel verion is 4.4.
I think this is a bug that has been around for a long time.
I think the problem is not limited to btrfs, as long as other fs
have not implemented the migrationpage, they will encounter
the problem. (Eg ecryptfs, fat, nfs...)
btrfs: implement migratepage callback for data pages
https://git.kernel.org/pub/scm/linux/kernel
/git/torvalds/linux.git/commit/?h=v5.8-rc5&
id=f8e6608180a31cc72a23b74969da428da236dbd1
That's a new commit, so if this is really affecting upstream btrfs pre-5.8 we
should either backport that commit, or your fix (after review).
We fix the by first remove the migration entry and then clearing
the dirty bits of the page, which also clears the pte's dirty bits.
Signed-off-by: Robbie Ko <robbieko@xxxxxxxxxxxx>
---
mm/migrate.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index f37729673558..5c407434b9ba 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -875,10 +875,6 @@ static int writeout(struct address_space *mapping, struct page *page)
/* No write method for the address space */
return -EINVAL;
- if (!clear_page_dirty_for_io(page))
- /* Someone else already triggered a write */
- return -EAGAIN;
-
/*
* A dirty page may imply that the underlying filesystem has
* the page on some queue. So the page must be clean for
@@ -889,6 +885,10 @@ static int writeout(struct address_space *mapping, struct page *page)
*/
remove_migration_ptes(page, page, false);
+ if (!clear_page_dirty_for_io(page))
+ /* Someone else already triggered a write */
+ return -EAGAIN;
+
rc = mapping->a_ops->writepage(page, &wbc);
if (rc != AOP_WRITEPAGE_ACTIVATE)