On 5/10/21 5:33 PM, Mike Kravetz wrote: > On 5/7/21 2:21 PM, Mina Almasry wrote: >> I ran into a bug that I'm not sure how to solve so I'm wondering if >> anyone has suggestions on what the issue could be and how to >> investigate. I added the WARN_ON_ONCE() here to catch instances of >> resv_huge_pages underflowing: >> > > I am fairly confident the issue is with hugetlb_mcopy_atomic_pte. It > does not detect/handle the case where a page cache page already exists > when in MCOPY_ATOMIC_NORMAL mode. If you add a printk/warning after the > failure of huge_add_to_page_cache, these will generally correspond to > the underflow. From a reservation POV, if the page exists in the cache > the reservation was already consumed. The call to alloc_huge_page will > 'consume' another reservation which can lead to the underflow. As you > noted, this underflow gets cleaned up in the error path. However, we > should prevent it from happening as we do not want anyone making > decisions on that underflow value. > > hugetlb_mcopy_atomic_pte should check for a page in the cache and if it > exists use it in MCOPY_ATOMIC_NORMAL. This code is quite tricky and my > first simple attempt at this did not work. I am happy to continue > working on this. However, if you or anyone else want to jump in and fix > feel free. I looked at this a bit more today and am not exactly sure of the expected behavior. The situation is: - UFFDIO_COPY is called for hugetlb mapping - the dest address is in a shared mapping - there is a page in the cache associated with the address in the shared mapping Currently, the code will fail when trying to update the page cache as the entry already exists. The shm code appears to do the same. Quick question. Is this the expected behavior? Or, would you expect the UFFDIO_COPY to update the page in the page cache, and then resolve the fault/update the pte? -- Mike Kravetz