On 06/12/23 16:34, Mike Kravetz wrote: > On 06/12/23 18:41, Matthew Wilcox wrote: > > On Tue, Nov 01, 2022 at 03:30:55PM -0700, Sidhartha Kumar wrote: > > > +++ b/mm/hugetlb.c > > > @@ -2815,7 +2815,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, > > > int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) > > > { > > > struct hstate *h; > > > - struct page *head; > > > + struct folio *folio = page_folio(page); > > > > Is this safe? I was reviewing a different patch today, and I spotted > > this. With THP, we can relatively easily hit this case: > > > > struct page points to a page with pfn 0x40305, in a folio of order 2. > > We call page_folio() on it and the resulting pointer is for the folio > > with pfn 0x40304. > > If we don't have our own refcount (or some other protection ...) against > > freeing, the folio can now be freed and reallocated. Say it's now part > > of an order-3 folio. > > Our 'folio' pointer is now actually a pointer to a tail page, and we > > have various assertions that a folio pointer doesn't point to a tail > > page, so they trigger. > > > > It seems to me that this ... > > > > /* > > * The page might have been dissolved from under our feet, so make sure > > * to carefully check the state under the lock. > > * Return success when racing as if we dissolved the page ourselves. > > */ > > spin_lock_irq(&hugetlb_lock); > > if (folio_test_hugetlb(folio)) { > > h = folio_hstate(folio); > > } else { > > spin_unlock_irq(&hugetlb_lock); > > return 0; > > } > > > > implies that we don't have our own reference on the folio, so we might > > find a situation where the folio pointer we have is no longer a folio > > pointer. > > Your analysis is correct. > > This is not safe because we hold no locks or references. The folio > pointer obtained via page_folio(page) may not be valid when calling > folio_test_hugetlb(folio) and later. > > My bad for the Reviewed-by: :( > I was looking at this more closely and need a bit of clarification. As mentioned, your analysis is correct. However, it appears that there is other code doing: folio = page_folio(page); ... if (folio_test_hugetlb(folio)) without holding a folio ref or some type of lock. split_huge_pages_all() is one such example. So, either this code has the same issue or there are folio routines that can be called without holding a ref/lock. The kerneldoc for folio_test_hugetlb says "Caller should have a reference on the folio to prevent it from being turned into a tail page.". However, is that mostly to make sure the returned value is consistent/valid? Can it really lead to an assert if folio pointer is changed to point to something else? > > Maybe the page_folio() call should be moved inside the hugetlb_lock > > protection? Is that enough? I don't know enough about how hugetlb > > pages are split, freed & allocated to know what's going on. Upon further thought, I think we should move the page_folio() inside the lock just to be more correct. > > > > But then we _drop_ the lock, and keep referring to ... > > > > > @@ -2841,10 +2840,10 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) > > > if (hstate_is_gigantic(h)) > > > return -ENOMEM; > > > > > > - if (page_count(head) && !isolate_hugetlb(head, list)) > > > + if (folio_ref_count(folio) && !isolate_hugetlb(&folio->page, list)) > > > ret = 0; > > > - else if (!page_count(head)) > > > - ret = alloc_and_dissolve_huge_page(h, head, list); > > > + else if (!folio_ref_count(folio)) > > > + ret = alloc_and_dissolve_huge_page(h, &folio->page, list); > > The above was OK when using struct page instead of folio. The 'racy' > part was getting the ref count on the head page. It was OK because this > was only a check to see if we should TRY to isolate or dissolve. The > code to actually isolate or dissolve would take the appropriate locks. page_count() is doing 'folio_ref_count(page_folio(page));' and there I suspect there are many places doing page_count without taking a page ref or locking. So, it seems like this would also be safe? > I'm afraid the code is now making even more use of a potentially invalid > folio. Here is how the above now looks in v6.3: > > spin_unlock_irq(&hugetlb_lock); > > /* > * Fence off gigantic pages as there is a cyclic dependency between > * alloc_contig_range and them. Return -ENOMEM as this has the effect > * of bailing out right away without further retrying. > */ > if (hstate_is_gigantic(h)) > return -ENOMEM; > > if (folio_ref_count(folio) && isolate_hugetlb(folio, list)) > ret = 0; > else if (!folio_ref_count(folio)) > ret = alloc_and_dissolve_hugetlb_folio(h, folio, list); > > Looks like that potentially invalid folio is being passed to other > routines. Previous code would take lock and revalidate that struct page > was still a hugetlb page. We can not do the same with a folio. Perhaps I spoke too soon. Yes, we pass a potentially invalid folio pointer to isolate_hugetlb() and alloc_and_dissolve_hugetlb_folio(). However, it seems the validation they perform should be sufficient. bool isolate_hugetlb(struct folio *folio, struct list_head *list) { bool ret = true; spin_lock_irq(&hugetlb_lock); if (!folio_test_hugetlb(folio) || !folio_test_hugetlb_migratable(folio) || !folio_try_get(folio)) { ret = false; goto unlock; static int alloc_and_dissolve_hugetlb_folio(struct hstate *h, struct folio *old_folio, struct list_head *list) { ... retry: spin_lock_irq(&hugetlb_lock); if (!folio_test_hugetlb(old_folio)) { ... } else if (folio_ref_count(old_folio)) { ... } else if (!folio_test_hugetlb_freed(old_folio)) { ... goto retry; } else { /* * Ok, old_folio is still a genuine free hugepage. Upon further consideration, I do not see an issue with the existing code. If there are issues with calling folio_test_hugetlb() or folio_ref_count() on a potentially invalid folio pointer, then we do have issues here. However, such an issue would be more widespread as there is more code doing the same. -- Mike Kravetz