On Wed, Sep 22, 2021 at 12:58 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Wed, Sep 22, 2021 at 12:37 PM Luck, Tony <tony.luck@xxxxxxxxx> wrote: > > > > On Wed, Aug 18, 2021 at 10:41:16PM -0700, Yang Shi wrote: > > > Currently just very simple message is shown for unhandlable page, e.g. > > > non-LRU page, like: > > > soft_offline: 0x1469f2: unknown non LRU page type 5ffff0000000000 () > > > > > > It is not very helpful for further debug, calling dump_page() could show > > > more useful information. > > > > Looks like your code already caught something. An error injection > > test may have injected into a shared library. Though I'm not sure that > > the refcount/mapcount in the dump agrees with that diagnosis from the > > author of this test. > > The messages from dump_page() are (unwind them from mce logs): > > [ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0 > mapping:0000000000000000 index:0x0 pfn:0xcef2747 > [ 4817.646860] flags: > 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff) > [ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8 > 0000000000000000 > [ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff > 0000000000000000 Missed one line from the dump: [ 4818.321804] page dumped because: hwpoison: unhandlable page Anyway dump_page() is just called when unhandlable page is met. > > The page flags tell it is a "reserved" page and mapping is NULL. It > doesn't seem like a user page or movable page, so hwpoision can't > handle it so that the messages are dumped. > > > > > Here's what appeared on the console: > > > > [ 4817.622254] mce: Uncorrected hardware memory error in user-access at cef2747000 > > [ 4817.630520] page:000000003ab9dca4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xcef2747 > > [ 4817.638651] mce: Uncorrected hardware memory error in user-access at cef2747000 > > [ 4817.646860] flags: 0x57ffffc0801000(reserved|hwpoison|node=1|zone=2|lastcpupid=0x1fffff) > > [ 4818.025515] mce: Uncorrected hardware memory error in user-access at cef2747000 > > [ 4818.033689] raw: 0057ffffc0801000 ffd400033bc9d1c8 ffd400033bc9d1c8 0000000000000000 > > [ 4818.272435] mce: Uncorrected hardware memory error in user-access at cef2747000 > > [ 4818.280640] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 > > [ 4818.280658] mce: Uncorrected hardware memory error in user-access at cef2747000 > > [ 4818.313606] mce: Uncorrected hardware memory error in user-access at cef2747000 > > [ 4818.321804] page dumped because: hwpoison: unhandlable page > > [ 4818.564802] mce: Uncorrected hardware memory error in user-access at cef2747000 > > [ 4818.573043] Memory failure: 0xcef2747: recovery action for unknown page: Ignored > > [ 4818.595837] Memory failure: 0xcef2747: already hardware poisoned > > [ 4818.603245] Memory failure: 0xcef2747: Sending SIGBUS to multichase:67460 due to hardware memory corruption > > [ 4818.614297] Memory failure: 0xcef2747: already hardware poisoned > > > > -Tony