On Sat, Mar 31, 2012 at 4:07 AM, Chris Metcalf <cmetcalf@xxxxxxxxxx> wrote: > The race is as follows. Suppose a multi-threaded task forks a new > process, thus bumping up the ref count on all the pages. While the fork > is occurring (and thus we have marked all the PTEs as read-only), another > thread in the original process tries to write to a huge page, taking an > access violation from the write-protect and calling hugetlb_cow(). Now, > suppose the fork() fails. It will undo the COW and decrement the ref > count on the pages, so the ref count on the huge page drops back to 1. > Meanwhile hugetlb_cow() also decrements the ref count by one on the > original page, since the original address space doesn't need it any more, > having copied a new page to replace the original page. This leaves the > ref count at zero, and when we call unlock_page(), we panic. > > The solution is to take an extra reference to the page while we are > holding the lock on it. > If the following chart matches the above description, === fork on CPU A fault on CPU B ============= ============== ... down_write(&parent->mmap_sem); down_write_nested(&child->mmap_sem); ... while duplicating vmas if error break; ... up_write(&child->mmap_sem); up_write(&parent->mmap_sem); ... down_read(&parent->mmap_sem); ... lock_page(page); handle COW page_mapcount(old_page) == 2 alloc and prepare new_page ... handle error page_remove_rmap(page); put_page(page); ... fold new_page into pte page_remove_rmap(page); put_page(page); ... oops ==> unlock_page(page); up_read(&parent->mmap_sem); === would you please spin with description refreshed? > Cc: stable@xxxxxxxxxx Let Andrew do the stable work, ok? Best Regards -hd > Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx> > --- > This change incorporates Hillf Danton's suggestion to just unconditionally > get and put the page around the region of code in question. > > mm/hugetlb.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 1871753..5f53d6b 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2701,6 +2701,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * so no worry about deadlock. > */ > page = pte_page(entry); > + get_page(page); > if (page != pagecache_page) > lock_page(page); > > @@ -2732,6 +2733,7 @@ out_page_table_lock: > } > if (page != pagecache_page) > unlock_page(page); > + put_page(page); > > out_mutex: > mutex_unlock(&hugetlb_instantiation_mutex); > -- > 1.6.5.2 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href