Re: [PATCH v4 4/6] mm: migrate: support poisoned recover from migrate folio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/6/2024 9:01 PM, Kefeng Wang wrote:



On 2024/6/7 6:31, Jane Chu wrote:

On 6/6/2024 3:28 PM, Jane Chu wrote:
On 6/6/2024 2:27 PM, Jane Chu wrote:

On 6/3/2024 2:24 AM, Kefeng Wang wrote:
diff --git a/mm/migrate.c b/mm/migrate.c
index e930376c261a..28aa9da95781 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -663,16 +663,29 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
                 struct folio *src, void *src_private,
                 enum migrate_mode mode)
  {
-    int rc;
+    int ret, expected_cnt = folio_expected_refs(mapping, src);
  -    rc = folio_migrate_mapping(mapping, dst, src, 0);
-    if (rc != MIGRATEPAGE_SUCCESS)
-        return rc;
+    if (!mapping) {
+        if (folio_ref_count(src) != expected_cnt)
+            return -EAGAIN;
+    } else {
+        if (!folio_ref_freeze(src, expected_cnt))
+            return -EAGAIN;
+    }
+

Let me take a guess, the reason you split up folio_migrate_copy() is that

folio_mc_copy() should be done before the 'src' folio's ->flags is changed, right?

Is there any other reason?  Could you add a comment please?

I see, both the clearing of the 'dirty' bit in the source folio, and the xas_store of the

new folio to the mapping, these need to be done after folio_mc_copy considering in the

Yes, many metadata are changed, and also some statistic(lruvec_state), so we have to move folio_copy() ahead.



event of UE, memory_failure() is called to handle the poison in the source page.

That said, since the poisoned page was queued up and handling is asynchronous, so in

theory, there is an extremely unlikely chance that memory_failure() is invoked after

folio_migrate_mapping(), do you think things would still be cool?

Hmm, perhaps after xas_store, the source folio->mapping should be set to NULL.

When the folio_mc_copy() return -EHWPOISON, we never call
folio_migrate_mapping(), the source folio is not changed, so
it should be safe to handle the source folio by a asynchronous
memory_failure(),
Right, I omitted this part, thanks!
maybe I'm missing something?

PS: we test it via error injection to dimm and then soft offline memory.

Got it.

Reviewed-by: Jane Chu <jane.chu@xxxxxxxxxx>

thanks,

-jane


Thanks.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux