Andrea Arcangeli <aarcange@xxxxxxxxxx> writes: > On Thu, Nov 19, 2015 at 04:22:55PM -0800, Andrew Morton wrote: >> On Thu, 19 Nov 2015 14:00:51 +0100 Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote: >> >> > This theoretical SMP race condition was found with source review. No >> > real life app could be affected as the result of freeing memory while >> > accessing it is either undefined or it's a workload the produces no >> > information. >> > >> > For something to go wrong because the SMP race condition triggered, >> > it'd require a further tiny window within the SMP race condition >> > window. So nothing bad is happening in practice even if the SMP race >> > condition triggers. It's still better to apply the fix to have the >> > math guarantee. >> > >> > The fix just adds a thp_mmu_gather atomic_t counter to the THP pages, >> > so split_huge_page can elevate the tail page count accordingly and >> > leave the tail page freeing task to whoever elevated thp_mmu_gather. >> > >> >> This is a pretty nasty patch :( We now have random page*'s with bit 0 >> set floating around in mmu_gather.__pages[]. It assumes/requires that > > Yes, and bit 0 is only relevant for the mmu_gather structure and its > users. > >> nobody uses those pages until they hit release_pages(). And the tlb >> flushing code is pretty twisty, with various Kconfig and arch dependent >> handlers. > > I already reviewed all callers and the mmu_gather freeing path for all > archs to be sure they all take the two paths that I updated in order > to free the pages. They call free_page_and_swap_cache or > free_pages_and_swap_cache. > > The freeing is using common code VM functions, and it shouldn't > improvise calling put_page() manually, the freeing has to take care of > collecting the swapcache if needed etc... it has to deal with VM > details the arch is not aware about. But it is still lot of really complicated code. > >> Is there no nicer way? > > We can grow the size of the mmu_gather to keep track that the page was > THP before the pmd_lock was dropped, in a separate bit from the struct > page pointer, but it'll take more memory. > > This bit 0 looks a walk in the park if compared to the bit 0 in > page->compound_head that was just introduced. The compound_head bit 0 > isn't only visible to the mmu_gather users (which should never try to > mess with the page pointer themself) and it collides with lru/next, > rcu_head.next users. > > If you prefer me to enlarge the mmu_gather structure I can do that. > If we can update mmu_gather to track the page size of the pages, that will also help some archs to better implement tlb_flush(struct mmu_gather *). Right now arch/powerpc/mm/tlb_nohash.c does flush the tlb mapping for the entire mm_struct. we can also make sure that we do a force flush when we are trying to gather pages of different size. So one instance of mmu_gather will end up gathering pages of specific size only ? > 1 bit of extra information needs to be extracted before dropping the > pmd_lock in zap_huge_pmd() and it has to be available in > release_pages(), to know if the tail pages needs an explicit put_page > or not. It's basically a bit in the local stack, except it's not in > the local stack because it is an array of pages, so it needs many > bits and it's stored in the mmu_gather along the page. > > Aside from the implementation of bit 0, I can't think of a simpler > design that provides for the same performance and low locking overhead > (this patch actually looks like an optimization when it's not helping > to prevent the race, because to fix the race I had to reduce the > number of times the lru_lock is taken in release_pages). > >> > +/* >> > + * free_trans_huge_page_list() is used to free the pages returned by >> > + * trans_huge_page_release() (if still PageTransHuge()) in >> > + * release_pages(). >> > + */ >> >> There is no function trans_huge_page_release(). > > Oops I updated the function name but not the comment... thanks! > Andrea > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index f7ae08f..2810322 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -225,9 +225,8 @@ static inline int trans_huge_mmu_gather_count(struct page *page) > } > > /* > - * free_trans_huge_page_list() is used to free the pages returned by > - * trans_huge_page_release() (if still PageTransHuge()) in > - * release_pages(). > + * free_trans_huge_page_list() is used to free THP pages (if still > + * PageTransHuge()) in release_pages(). > */ > extern void free_trans_huge_page_list(struct list_head *list); > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>