On Mon, Aug 12, 2013 at 9:46 PM, Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> wrote: > On Mon, Aug 12, 2013 at 05:49:37PM -0400, Jerome Glisse wrote: >> On Mon, Aug 12, 2013 at 05:36:40PM -0400, Naoya Horiguchi wrote: >> > Hi Jerome, >> > >> > On Mon, Aug 12, 2013 at 11:43:24AM -0400, j.glisse@xxxxxxxxx wrote: >> > > From: Jerome Glisse <jglisse@xxxxxxxxxx> >> > > >> > > Prior to this copy_one_pte will never reach the special swap file >> > > handling code because swap_duplicate will return invalid value. >> > > >> > > Note this is not fatal so nothing bad ever happen because of that. >> > > Reason is that copy_pte_range would break of its loop and call >> > > add_swap_count_continuation which would see its a special swap >> > > file and return 0 triggering copy_pte_range to try again. Because >> > > we try again there is a huge chance that the temporarily special >> > > migration pte is now again valid and pointing to a new valid page. >> > > >> > > This patch just split handling of special swap entry from regular >> > > one inside copy_one_pte. >> > > >> > > (Note i spotted that while reading code i haven't tested my theory.) >> > > >> > > Signed-off-by: Jerome Glisse <jglisse@xxxxxxxxxx> >> > >> > non_swap_entry() means not only migration entry, but also hwpoison entry, >> > so it seems to me that simply moving the swap_duplicate() into the >> > if(!non_swap_entry) block can change the behavior for hwpoison entry. >> > Would it be nice to add check for such a case? >> > >> > Thanks, >> > Naoya Horiguchi >> >> Well if my reading of the code is right for hwpoison entry current code will >> loop indefinitly inside the kernel on fork if one entry is set to hwpoison. > > (Sorry if I missed something, but) __swap_duplicate always returns > -EINVAL if non_swap_entry is true and then swap_duplicate returns 0. > So copy_one_pte() doesn't return at if (swap_duplicate(entry) < 0) > block for non_swap_entry. So ... No, it's my fault i didn't pay attention to swap_duplicate only to __swap_duplicate, also means my bug is elsewhere. Thanks for looking. Cheers, Jerome >> > > Prior to this copy_one_pte will never reach the special swap file >> > > handling code because swap_duplicate will return invalid value. > > this seems not correct to me. > Could you explain more about this point? > (Maybe I don't understand the terminology "special swap file"...) > >> My patch does not handle hwpoison because it seems useless as there is nothing >> to do for hwpoison pte beside giving setting the new pte to hwpoison to. So >> the fork child will also have a pte with hwpoison. My patch do just that. > > Yes, just copying hwpoison entry looks a right behavior to me. > > Thanks, > Naoya Horiguchi > >> So change in behavior is current kernel loop indefinitly in kernel with hwpoison >> pte on fork, vs child get hwpoison pte with my patch. Meaning that both child >> and father can live as long as they dont access the hwpoisoned ptes. >> >> > >> > > --- >> > > mm/memory.c | 26 +++++++++++++------------- >> > > 1 file changed, 13 insertions(+), 13 deletions(-) >> > > >> > > diff --git a/mm/memory.c b/mm/memory.c >> > > index 1ce2e2a..9f907dd 100644 >> > > --- a/mm/memory.c >> > > +++ b/mm/memory.c >> > > @@ -833,20 +833,20 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, >> > > if (!pte_file(pte)) { >> > > swp_entry_t entry = pte_to_swp_entry(pte); >> > > >> > > - if (swap_duplicate(entry) < 0) >> > > - return entry.val; >> > > - >> > > - /* make sure dst_mm is on swapoff's mmlist. */ >> > > - if (unlikely(list_empty(&dst_mm->mmlist))) { >> > > - spin_lock(&mmlist_lock); >> > > - if (list_empty(&dst_mm->mmlist)) >> > > - list_add(&dst_mm->mmlist, >> > > - &src_mm->mmlist); >> > > - spin_unlock(&mmlist_lock); >> > > - } >> > > - if (likely(!non_swap_entry(entry))) >> > > + if (likely(!non_swap_entry(entry))) { >> > > + if (swap_duplicate(entry) < 0) >> > > + return entry.val; >> > > + >> > > + /* make sure dst_mm is on swapoff's mmlist. */ >> > > + if (unlikely(list_empty(&dst_mm->mmlist))) { >> > > + spin_lock(&mmlist_lock); >> > > + if (list_empty(&dst_mm->mmlist)) >> > > + list_add(&dst_mm->mmlist, >> > > + &src_mm->mmlist); >> > > + spin_unlock(&mmlist_lock); >> > > + } >> > > rss[MM_SWAPENTS]++; >> > > - else if (is_migration_entry(entry)) { >> > > + } else if (is_migration_entry(entry)) { >> > > page = migration_entry_to_page(entry); >> > > >> > > if (PageAnon(page)) >> > > -- >> > > 1.8.3.1 >> > > >> > > -- >> > > To unsubscribe, send a message with 'unsubscribe linux-mm' in >> > > the body to majordomo@xxxxxxxxx. For more info on Linux MM, >> > > see: http://www.linux-mm.org/ . >> > > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> >> > > >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>