On Thu, May 20, 2021 at 12:18 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote: > > On Thu, May 13, 2021 at 5:14 PM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > > > On 5/13/21 4:49 PM, Mina Almasry wrote: > > > On Thu, May 13, 2021 at 4:43 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote: > > >> > > >> When hugetlb_mcopy_atomic_pte() is called with: > > >> - mode==MCOPY_ATOMIC_NORMAL and, > > >> - we already have a page in the page cache corresponding to the > > >> associated address, > > >> > > >> We will allocate a huge page from the reserves, and then fail to insert it > > >> into the cache and return -EEXIST. In this case, we need to return -EEXIST > > >> without allocating a new page as the page already exists in the cache. > > >> Allocating the extra page causes the resv_huge_pages to underflow temporarily > > >> until the extra page is freed. > > >> > > >> To fix this we check if a page exists in the cache, and allocate it and > > >> insert it in the cache immediately while holding the lock. After that we > > >> copy the contents into the page. > > >> > > >> As a side effect of this, pages may exist in the cache for which the > > >> copy failed and for these pages PageUptodate(page) == false. Modify code > > >> that query the cache to handle this correctly. > > >> > > > > > > To be honest, I'm not sure I've done this bit correctly. Please take a > > > look and let me know what you think. It may be too overly complicated > > > to have !PageUptodate() pages in the cache and ask the rest of the > > > code to handle that edge case correctly, but I'm not sure how else to > > > fix this issue. > > > > > > > I think you just moved the underflow from hugetlb_mcopy_atomic_pte to > > hugetlb_no_page. Why? > > > > Consider the case where there is only one reserve left and someone does > > the MCOPY_ATOMIC_NORMAL for the address. We will allocate the page and > > consume the reserve (reserve count == 0) and insert the page into the > > cache. Now, if the copy_huge_page_from_user fails we must drop the > > locks/fault mutex to do the copy. While locks are dropped, someone > > faults on the address and ends up in hugetlb_no_page. The page is in > > the cache but not up to date, so we go down the allocate new page path > > and will decrement the reserve count again to cause underflow. > > > > How about this approach? > > - Keep the check for hugetlbfs_pagecache_present in hugetlb_mcopy_atomic_pte > > that you added. That will catch the race where the page was added to > > the cache before entering the routine. > > - With the above check in place, we only need to worry about the case > > where copy_huge_page_from_user fails and we must drop locks. In this > > case we: > > - Free the page previously allocated. > > - Allocate a 'temporary' huge page without consuming reserves. I'm > > thinking of something similar to page migration. > > - Drop the locks and let the copy_huge_page_from_user be done to the > > temporary page. > > - When reentering hugetlb_mcopy_atomic_pte after dropping locks (the > > *pagep case) we need to once again check > > hugetlbfs_pagecache_present. > > - We then try to allocate the huge page which will consume the > > reserve. If successful, copy contents of temporary page to newly > > allocated page. Free temporary page. > > > > There may be issues with this, and I have not given it deep thought. It > > does abuse the temporary huge page concept, but perhaps no more than > > page migration. Things do slow down if the extra page allocation and > > copy is required, but that would only be the case if copy_huge_page_from_user > > needs to be done without locks. Not sure, but hoping that is rare. > > Just following up this a bit: I've implemented this approach locally, > and with it it's passing the test as-is. When I hack the code such > that the copy in hugetlb_mcopy_atomic_pte() always fails, I run into > this edge case, which causes resv_huge_pages to underflow again (this > time permemantly): > > - hugetlb_no_page() is called on an index and a page is allocated and > inserted into the cache consuming the reservation. > - remove_huge_page() is called on this index and the page is removed from cache. > - hugetlb_mcopy_atomic_pte() is called on this index, we do not find > the page in the cache and we trigger this code patch and the copy > fails. > - The allocations in this code path seem to double consume the > reservation and resv_huge_pages underflows. > > I'm looking at this edge case to understand why a prior > remove_huge_page() causes my code to underflow resv_huge_pages. > I should also mention, without a prior remove_huge_page() this code path works fine, so it seems the fact that the reservation is consumed before causes trouble, but I'm not sure why yet. > > -- > > Mike Kravetz