On 9/17/16 12:25 AM, Dave Hansen wrote:
That's an interesting data point, but it still doesn't quite explain what is going on. It seems like there might be parts of gigantic pages that have PageHuge() set on tail pages, while other parts don't. If that's true, we have another bug and your patch just papers over the issue. I think you really need to find the root cause before we apply this patch.
The root cause is the test scripts(tools/testing/selftests/memory- hotplug/mem-on-off-test.sh) changes online/offline status on memory blocks other than page header. It will *randomly* select 10% memory blocks from /sys/devices/system/memory/memory*, and change their online/offline status. On my system, the memory block size is 0x10000000: [root@elvis-n01-kvm memory]# cat block_size_bytes 10000000 But the huge page size(16G) is more than this memory block size. So one huge page is composed by several memory blocks. For example, memory704, memory705, memory706 and so on. Then memory704 will contain a head page, but memory705 will *only* contain tail pages. So the problem will happened on it, if we call: #echo offline > memory705/state That's why we need a PageHead() check now, and why this problem does not happened on systems with smaller huge page such as 16M. As far as the PageHuge() set, I think PageHuge() will return true for all tail pages. Because it will get the compound_head for tail page, and then get its huge page flag. page = compound_head(page); And as far as the failure message, if one memory block is in use, it will return failure when offline it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>