On Tue, Aug 17, 2021 at 10:02 PM HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@xxxxxxx> wrote: > > On Mon, Aug 16, 2021 at 01:24:25PM -0700, Yang Shi wrote: > > On Mon, Aug 16, 2021 at 12:38 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > > > On Mon, Aug 16, 2021 at 11:09:08AM -0700, Yang Shi wrote: > > > > But the most disappointing thing is all the effort doesn't make the page > > > > offline, it just returns: > > > > > > > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 () > > > > > > It's a shame it doesn't call dump_page(). There might be more > > > interesting information somewhere in struct page that would help us > > > figure out what kind of page it was in your environment. For example, > > > it might be a page table page or a page allocated for vmalloc(), and > > > in both those cases, there are things we might be able to do (we'd > > > certainly be able to figure out that it isn't worth shrinking slab!) > > > > Yes, dump_page() could provide more information to us. I could add a > > new patch or just update this patch to call dump_page() if offline is > > failed if the hwpoison maintainer agrees to this as well. > > I agree with showing more information in failure case. Thanks for the input. By reading the code, it seems get_any_page() is called to shake the page for both soft offline and memory_failure(), so it seems like a good place to call dump_page() if -EIO is going to be returned, which hwpoison can't handle the page, otherwise we may need call dump_page() in a couple of different places. Although dump_page() will be called with pcp disabled and holding memory hotplug lock if it is called by get_any_page(), but I'm supposed it should be not a big deal. > > - Naoya Horiguchi