On Wed, Feb 24, 2021 at 6:32 AM Oscar Salvador <osalvador@xxxxxxx> wrote: > > On Tue, Feb 23, 2021 at 04:41:28PM +0100, Oscar Salvador wrote: > > On Tue, Feb 23, 2021 at 11:50:05AM +0100, Oscar Salvador wrote: > > > > CPU0: CPU1: > > > > set_compound_page_dtor(HUGETLB_PAGE_DTOR); > > > > memory_failure_hugetlb > > > > get_hwpoison_page > > > > __get_hwpoison_page > > > > get_page_unless_zero > > > > put_page_testzero() > > > > > > > > Maybe this can happen. But it is a very corner case. If we want to > > > > deal with this. We can put_page_testzero() first and then > > > > set_compound_page_dtor(HUGETLB_PAGE_DTOR). > > > > > > I have to check further, but it looks like this could actually happen. > > > Handling this with VM_BUG_ON is wrong, because memory_failure/soft_offline are > > > entitled to increase the refcount of the page. > > > > > > AFAICS, > > > > > > CPU0: CPU1: > > > set_compound_page_dtor(HUGETLB_PAGE_DTOR); > > > memory_failure_hugetlb > > > get_hwpoison_page > > > __get_hwpoison_page > > > get_page_unless_zero > > > put_page_testzero() > > > identify_page_state > > > me_huge_page > > > > > > I think we can reach me_huge_page with either refcount = 1 or refcount =2, > > > depending whether put_page_testzero has been issued. > > > > > > For now, I would not re-enqueue the page if put_page_testzero == false. > > > I have to see how this can be handled gracefully. > > > > I took a brief look. > > It is not really your patch fault. Hugetlb <-> memory-failure synchronization is > > a bit odd, it definitely needs improvment. > > > > The thing is, we can have different scenarios here. > > E.g: by the time we return from put_page_testzero, we might have refcount == > > 0 and PageHWPoison, or refcount == 1 PageHWPoison. > > > > The former will let a user get a page from the pool and get a sigbus > > when it faults in the page, and the latter will be even more odd as we > > will have a self-refcounted page in the free pool (and hwpoisoned). I have been looking at the dequeue_huge_page_node_exact(). If a PageHWPoison huge page is in the free pool list, the page will not be allocated to the user. The PageHWPoison huge page will be skip in the dequeue_huge_page_node_exact(). > > > > As I said, it is not this patchset fault. I just made me realize this > > problem. > > > > I have to think some more about this. > > I have been thinking more about this. > memory failure events can occur at any time, and we might not be in a > position where we can handle gracefully the error, meaning that the page > might end up in non desirable state. > > E.g: we could flag the page right before enqueing it. > > I still think that VM_BUG_ON should go, as the refcount can be perfectly > increased by memory-failure/soft_offline handlers, so BUGing there does > not make much sense. Make sense. I will remove the VM_BUG_ON. > > One think we could do is to check the state of the page we want to > retrieve from the free hugepage pool. > We should discard any HWpoisoned ones, and dissolve them. > > The thing is, memory-failure/soft_offline should allocate a new hugepage > for the free pool, so keep the pool stable. > Something like [1]. > > Anyway, this is orthogonal to this patch, and something I will work on > soon. > > [1] https://lore.kernel.org/linux-mm/20210222135137.25717-2-osalvador@xxxxxxx/T/#u Thanks for your efforts on this. > > -- > Oscar Salvador > SUSE L3