Re: [PATCH v2 1/3] mm: memory-failure: update ttu flag inside unmap_poisoned_folio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.01.25 04:20, Miaohe Lin wrote:
On 2025/1/20 16:46, David Hildenbrand wrote:
On 20.01.25 08:49, David Hildenbrand wrote:

        if (folio_test_hugetlb(folio) && !folio_test_anon(folio)) {
            struct address_space *mapping;
    @@ -1572,7 +1598,7 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
            if (!mapping) {
                pr_info("%#lx: could not lock mapping for mapped hugetlb folio\n",
                    folio_pfn(folio));
-            return;
+            return -EBUSY;
            }
               try_to_unmap(folio, ttu|TTU_RMAP_LOCKED);
@@ -1580,6 +1606,8 @@ void unmap_poisoned_folio(struct folio *folio, enum ttu_flags ttu)
        } else {
            try_to_unmap(folio, ttu);
        }
+
+    return folio_mapped(folio) ? -EBUSY : 0;

Do we really need this return value? It's unused in do_migrate_range().

I suggested it, because the folio_mapped() is nowadays extremely cheap.
It cleans up hwpoison_user_mappings() quite nicely.

I'm also wondering, if in do_migrate_range(), we want to pr_warn_ratelimit() in case still mapped after the call. IIUC, we don't really expect this to happen with SYNC set.

Do you mean TTU_SYNC? It seems it's not set.

With your patch it will be now, which is the right thing to do I think.


There might be a race will hit the proposed pr_warn_ratelimit():

/* Assume folio is isolated for reclaim, so memory_failure failed to handle it at first time. Then it's put back to LRU. */
do_migrate_range
  folio_test_hwpoison
   folio_mapped
   <folio is isolated for reclaim again.>
    unmap_poisoned_folio
   <folio is put buck.>
     pr_warn_ratelimit(folio_mapped)

But I might be miss something. And even this race is possible, it should be really hard to hit.

Does try_to_unmap() care about isolation? Skimming over the code, I don't think so. I assume once we take the folio lock, races with reclaim are impossible.

In any case, the race is unexpected, so pr_warn_() would be helpful and not harmful.

Memory offlining code will later simply skip all PageHWPoison() pages, independent of the refcount as it seems. Failing to unmap might not be handled correctly at all ... I think this might be problematic in other regard (e.g., GUP references), but failing to unmap is "obviously" bad I think :)

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux