On 10/19/2017 07:30 PM, Naoya Horiguchi wrote: > On Thu, Oct 19, 2017 at 04:00:07PM -0700, Mike Kravetz wrote: > > Thank you for addressing this. The patch itself looks good to me, but > the reported issue (negative reserve count) doesn't reproduce in my trial > with v4.14-rc5, so could you share the exact procedure for this issue? Sure, but first one question on your test scenario below. > > When error handler runs over a huge page, the reserve count is incremented > so I'm not sure why the reserve count goes negative. I'm not sure I follow. What specific code is incrementing the reserve count? > My operation is like below: > > $ sysctl vm.nr_hugepages=10 > $ grep HugePages_ /proc/meminfo > HugePages_Total: 10 > HugePages_Free: 10 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > $ ./test_alloc_generic -B hugetlb_file -N1 -L "mmap access memory_error_injection:error_type=madv_hard" // allocate a 2MB file on hugetlbfs, then madvise(MADV_HWPOISON) on it. > $ grep HugePages_ /proc/meminfo > HugePages_Total: 10 > HugePages_Free: 9 > HugePages_Rsvd: 1 // reserve count is incremented > HugePages_Surp: 0 This is confusing to me. I can not create a test where there is a reserve count after poisoning page. I tried to recreate your test. Running unmodified 4.14.0-rc5. Before test ----------- HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB After open(creat) and mmap of 2MB hugetlbfs file ------------------------------------------------ HugePages_Total: 1 HugePages_Free: 1 HugePages_Rsvd: 1 HugePages_Surp: 0 Hugepagesize: 2048 kB Reserve count is 1 as expected/normal After madvise(MADV_HWPOISON) of the single huge page in mapping/file -------------------------------------------------------------------- HugePages_Total: 1 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB In this case, the reserve (and free) count were decremented. Note that before the poison operation the page was not associated with the mapping/ file. I did not look closely at the code, but assume the madvise may cause the page to be 'faulted in'. The counts remain the same when the program exits ------------------------------------------------- HugePages_Total: 1 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Remove the file (rm /var/opt/oracle/hugepool/foo) ------------------------------------------------- HugePages_Total: 1 HugePages_Free: 0 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Hugepagesize: 2048 kB I am still confused about how your test maintains a reserve count after poisoning. It may be a good idea for you to test my patch with your test scenario as I can not recreate here. -- Mike Kravetz