In commit eef1b3ba053a("thp: implement split_huge_pmd()"): "Original split_huge_page() combined two operations: splitting PMDs into tables of PTEs and splitting underlying compound page. This patch implements split_huge_pmd() which split given PMD without splitting other PMDs this page mapped with or underlying compound page." In this situation, suppose a process is allocated a large number of transparent huge pages and it releases partial memory of these huge pages later. The memory occupied by the process will decrease after split_huge_pmd(). However, the free memory of the system may not increase because the huge page has not been split. In addition, the rss in the memory.stat of the cgroup which the process belongs to is much larger than expected. This causes some problems: - Users cannot get exact size of free memory to evaluate the system's workloads. - The memory usage of service is unstable due to unpredictable partial unmap of transparent huge pages. We are not sure if there is memory leak or other problems. Here is an example: # cat memory.stat ... rss 297230336 rss_huge 230686720 ... # echo 2 > /proc/sys/vm/drop_caches (this can split some transpanrent huge pages) # cat memory.stat ... rss 118128640 rss_huge 27262976 ... As memory.stat shows, memory usage before split huge pages is more than twice the actual memory usage. Two possible solutions: - Provide the split_huge_page_pmd() again and add a sysfs interface for users to choose split_huge_page_pmd() or split_huge_pmd() when releasing memory of transparent huge pages. - Add a statistics item to /proc/meminfo and memory cgroup to display the memory released of partial unmap so that users can calculate the actual free memory of the current system. I haven't implemented the patch yet. Hope there's a better solution.