> Hi David, Hi, > > arch_remap_one() sounds reasonable. Would you like to add that in your patch series, or would you like me to create a > separate patch for adding this on top of your patch series? Let's handle it separately, because I think there is still a lot to be clarified. I also feel like the terminology ("arch_unmap_one()") is a little too generic, if we really only care about anonymous pages (do we?). > >> >>> >>>> >>>> arch_unmap_one() calls adi_save_tags(), which allocates memory. >>>> adi_restore_tags()->del_tag_store() reverts that operation and ends up >>>> freeing memory conditionally; However, it's only >>>> called from arch_do_swap_page(). >>>> >>>> >>>> Here is why I have to scratch my head: >>>> >>>> a) arch_do_swap_page() is only called from do_swap_page(). We don't do anything similar >>>> for mm/swapfile.c:unuse_pte(), aren't we missing something? >>> >>> My understanding of this code path maybe flawed, so do correct me if this does not sound right. unused_pte() is called >>> upon user turning off swap on a device. unused_pte() is called by unused_pte_range() which swaps the page back in from >>> swap device before calling unuse_pte(). Once the page is read back in from swap, ultimately access to the va for the >>> page will result in call to __handle_mm_fault() which in turn will call handle_pte_fault() to insert a new pte for this >>> mapping and handle_pte_fault() will call arch_do_swap_page() which will restore the tags. >> >> unuse_pte() will replace a swap pte directly by a proper, present pte, >> just like do_swap_page() would. You won't end up in do_swap_page() >> anymore and arch_do_swap_page() won't be called, because there is no >> swap PTE anymore. >> >>> >>>> >>>> b) try_to_migrate_one() does the arch_unmap_one(), but who will do the >>>> restore+free after migration succeeded or failed, aren't we missing something? >>> >>> try_to_migrate_one() replaces the current pte with a migration pte after calling arch_unmap_one(). This causes >>> __handle_mm_fault() to be called when a reference to the va covered by migration pte is made. This will in turn finally >>> result in a call to arch_do_swap_page() which restores the tags. >> >> Migration PTEs are restore via mm/migrate.c:remove_migration_ptes(). >> arch_do_swap_page() won't be called. >> >> What you mention is if someone accesses the migration PTE while >> migration is active and the migration PTEs have not been removed yet. >> While we'll end up in do_swap_page(), we'll do a migration_entry_wait(), >> followed by an effective immediate "return 0;". arch_do_swap_page() >> won't get called. >> >> >> So in essence, I think this doesn't work as expected yet. In the best >> case we don't immediately free memory. In the worst case we lose the tags. >> > > I see what you mean. I can work on fixing these issues up. Ideally, I think we'd handle migration differently: migrate the tags directly instead of temporarily storing them I also wonder, how to handle migration of THP (which uses migration PMDs) and THP splitting (which also uses migration PTEs); maybe they don't apply on sparc? Further, I wonder if you have to handle zapping of swap/migration PTEs, and if you'd also have to cleanup+freeup any allcoated tag memory? E.g., zap_pte_range() can simply zap swap and migration entries, wouldn't Last but not least, I do wonder if mremap() -- e.g., move_pte() -- and fork() -- e.g., copy_pte_range() -- are handled properly, where we can end up moving/copying swap+migration PTEs around. -- Thanks, David / dhildenb