On 09/20/2016 07:45 AM, Rui Teng wrote: > On 9/17/16 12:25 AM, Dave Hansen wrote: >> >> That's an interesting data point, but it still doesn't quite explain >> what is going on. >> >> It seems like there might be parts of gigantic pages that have >> PageHuge() set on tail pages, while other parts don't. If that's true, >> we have another bug and your patch just papers over the issue. >> >> I think you really need to find the root cause before we apply this >> patch. >> > The root cause is the test scripts(tools/testing/selftests/memory- > hotplug/mem-on-off-test.sh) changes online/offline status on memory > blocks other than page header. It will *randomly* select 10% memory > blocks from /sys/devices/system/memory/memory*, and change their > online/offline status. Ahh, that does explain it! Thanks for digging into that! > That's why we need a PageHead() check now, and why this problem does > not happened on systems with smaller huge page such as 16M. > > As far as the PageHuge() set, I think PageHuge() will return true for > all tail pages. Because it will get the compound_head for tail page, > and then get its huge page flag. > page = compound_head(page); > > And as far as the failure message, if one memory block is in use, it > will return failure when offline it. That's good, but aren't we still left with a situation where we've offlined and dissolved the _middle_ of a gigantic huge page while the head page is still in place and online? That seems bad. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>