On 3/29/22 14:55, David Hildenbrand wrote:
On 29.03.22 22:42, Khalid Aziz wrote:
On 3/29/22 07:59, David Hildenbrand wrote:
On 15.03.22 11:47, David Hildenbrand wrote:
In case arch_unmap_one() fails, we already did a swap_duplicate(). let's
undo that properly via swap_free().
Fixes: ca827d55ebaa ("mm, swap: Add infrastructure for saving page metadata on swap")
Reviewed-by: Khalid Aziz <khalid.aziz@xxxxxxxxxx>
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
mm/rmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/rmap.c b/mm/rmap.c
index 6a1e8c7f6213..f825aeef61ca 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1625,6 +1625,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
break;
}
if (arch_unmap_one(mm, vma, address, pteval) < 0) {
+ swap_free(entry);
set_pte_at(mm, address, pvmw.pte, pteval);
ret = false;
page_vma_mapped_walk_done(&pvmw);
Hi Khalid,
I'm a bit confused about the semantics if arch_unmap_one(), I hope you can clarify.
See patch #11 in this series, were we can fail unmapping after arch_unmap_one() succeeded. E.g.,
@@ -1623,6 +1634,24 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
page_vma_mapped_walk_done(&pvmw);
break;
}
+ if (anon_exclusive &&
+ page_try_share_anon_rmap(subpage)) {
+ swap_free(entry);
+ set_pte_at(mm, address, pvmw.pte, pteval);
+ ret = false;
+ page_vma_mapped_walk_done(&pvmw);
+ break;
+ }
+ /*
+ * Note: We don't remember yet if the page was mapped
+ * exclusively in the swap entry, so swapin code has
+ * to re-determine that manually and might detect the
+ * page as possibly shared, for example, if there are
+ * other references on the page or if the page is under
+ * writeback. We made sure that there are no GUP pins
+ * on the page that would rely on it, so for GUP pins
+ * this is fine.
+ */
if (list_empty(&mm->mmlist)) {
spin_lock(&mmlist_lock);
if (list_empty(&mm->mmlist))
For now, I was under the impression that we don't have to undo anything after
arch_unmap_one() succeeded, because we seem to not do anything for two
cases below. But looking into arch_unmap_one() and how it allocates stuff I do
wonder what we would actually want to do here -- I'd assume we'd want to
trigger the del_tag_store() somehow?
Hi David,
Hi,
thanks for your fast reply.
Currently once arch_unmap_one() completes successfully, we are at the point of no return for this pte. It will be
replaced by swap pte soon thereafter. Patch 11 adds another case where we may return without replacing current pte with
swap pte. For now could you resolve this by moving the above code block in patch 11 to before the call to
arch_unmap_one(). That still leaves open the issue having the flexibility of undoing what arch_unmap_one() does for some
other reason in future. That will require coming up with a properly architected way to do it.
I really want clearing PG_anon_exclusive be the last action, without
eventually having to set it again and overcomplicating
PG_anon_exclusive/rmap handling. Ideally, we'd have a "arch_remap_one()"
that just reverts what arch_unmap_one() did.
Hi David,
arch_remap_one() sounds reasonable. Would you like to add that in your patch series, or would you like me to create a
separate patch for adding this on top of your patch series?
arch_unmap_one() calls adi_save_tags(), which allocates memory.
adi_restore_tags()->del_tag_store() reverts that operation and ends up
freeing memory conditionally; However, it's only
called from arch_do_swap_page().
Here is why I have to scratch my head:
a) arch_do_swap_page() is only called from do_swap_page(). We don't do anything similar
for mm/swapfile.c:unuse_pte(), aren't we missing something?
My understanding of this code path maybe flawed, so do correct me if this does not sound right. unused_pte() is called
upon user turning off swap on a device. unused_pte() is called by unused_pte_range() which swaps the page back in from
swap device before calling unuse_pte(). Once the page is read back in from swap, ultimately access to the va for the
page will result in call to __handle_mm_fault() which in turn will call handle_pte_fault() to insert a new pte for this
mapping and handle_pte_fault() will call arch_do_swap_page() which will restore the tags.
unuse_pte() will replace a swap pte directly by a proper, present pte,
just like do_swap_page() would. You won't end up in do_swap_page()
anymore and arch_do_swap_page() won't be called, because there is no
swap PTE anymore.
b) try_to_migrate_one() does the arch_unmap_one(), but who will do the
restore+free after migration succeeded or failed, aren't we missing something?
try_to_migrate_one() replaces the current pte with a migration pte after calling arch_unmap_one(). This causes
__handle_mm_fault() to be called when a reference to the va covered by migration pte is made. This will in turn finally
result in a call to arch_do_swap_page() which restores the tags.
Migration PTEs are restore via mm/migrate.c:remove_migration_ptes().
arch_do_swap_page() won't be called.
What you mention is if someone accesses the migration PTE while
migration is active and the migration PTEs have not been removed yet.
While we'll end up in do_swap_page(), we'll do a migration_entry_wait(),
followed by an effective immediate "return 0;". arch_do_swap_page()
won't get called.
So in essence, I think this doesn't work as expected yet. In the best
case we don't immediately free memory. In the worst case we lose the tags.
I see what you mean. I can work on fixing these issues up.
Thanks,
Khalid