On Tue, Aug 11, 2020 at 01:39:24PM -0400, Qian Cai wrote: > On Tue, Aug 11, 2020 at 03:11:40AM +0000, HORIGUCHI NAOYA(堀口 直也) wrote: > > I'm still not sure why the test succeeded by reverting these because > > current mainline kernel provides similar mechanism to prevent reuse of > > soft offlined page. So this success seems to me something suspicious. > > > > To investigate more, I want to have additional info about the page states > > of the relevant pages after soft offlining. Could you collect it by the > > following steps? > > > > - modify random.c not to run hotplug_memory() in migrate_huge_hotplug_memory(), > > - compile it and run "./random 1" once, > > - to collect page state with hwpoisoned pages, run "./page-types -Nlr -b hwpoison", > > where page-types is available under tools/vm in kernel source tree. > > - choose a few pfns of soft offlined pages from kernel message > > "Soft offlining pfn ...", and run "./page-types -Nlr -a <pfn>". > > # ./page-types -Nlr -b hwpoison > offset len flags > 99a000 1 __________B________X_______________________ > 99c000 1 __________B________X_______________________ > 99e000 1 __________B________X_______________________ > 9a0000 1 __________B________X_______________________ > ba6000 1 __________B________X_______________________ > baa000 1 __________B________X_______________________ Thank you. It only shows 6 lines of records, which is unexpected to me because random.c iterates soft offline 2 hugepages with madvise() 1000 times. Somehow (maybe in arch specific way?) other hwpoisoned pages might be cleared? If they really are, the success of this test is a fake, and this patchset can be considered as a fix. > > Every single one of pfns was like this, > > # ./page-types -Nlr -a 0x99a000 > offset len flags > 99a000 1 __________B________X_______________________ > > # ./page-types -Nlr -a 0x99e000 > offset len flags > 99e000 1 __________B________X_______________________ > > # ./page-types -Nlr -a 0x99c000 > offset len flags > 99c000 1 __________B________X_______________________