> TAO is an umbrella project aiming at a better economy of physical > contiguity viewed as a valuable resource. A few examples are: > 1. A multi-tenant system can have guaranteed THP coverage while > hosting abusers/misusers of the resource. > 2. Abusers/misusers, e.g., workloads excessively requesting and then > splitting THPs, should be punished if necessary. > 3. Good citizens should be awarded with, e.g., lower allocation > latency and less cost of metadata (struct page). I think TAO or similar optimization in buddy is essential to the success of mTHP. Ryan's recent mTHP work can widely bring multi-size large folios to various products while THP might be too large for them. But a pain is that the buddy of a real device with limited memory can be seriously fragmented after it runs for some time. We(OPPO) have actually brought up mTHP-like features on millions of phones even on 5.4, 5.10, 5.15 and 6.1 kernel with large folios whose size are 64KiB to leverage ARM64's CONT-PTE. The open source code for kernel 6.1 can be got here[1]. We found the success rate of 64KiB allocation could be very low after running monkey[2] on phones for one hour. After the phone has been running for one hour, the below is the data we collected from 60mins to 120mins(the second hour). w/o TAO-like optimization to the existing buddy, 64KiB large folios allocation can fall back to small folios at the rate of 92.35% in do_anonymous_page(). thp_do_anon_pages_fallback / (thp_do_anon_pages + thp_do_anon_pages_fallback) 25807330 / 27944523 = 0.9235 in do_anonymous_page(), thp_do_anon_pages_fallback is the number we try to allocate 64KiB but we fail, thus, we use small folios instead; thp_do_anon_pages is the number we try to allocate 64KiB and we succeed. So this number somehow means mTHP has lost vast majority of value on a fragmented system, while the fragmentation is always true for a phone. This has actually pushed us to implement a similar optimization to avoid splitting 64KiB and award 64KiB allocation with lower latency. Our implementation is different with TAO, rather than adding new zones, we are adding migration_types to mark some pageblocks are dedicated for mTHP allocation. And we avoid splitting them into lower orders except for some corner cases. This has significantly improved our success rate of 64KiB large folios allocation and decreased the latency, helped large folios to be finally applied in real products. [1] https://github.com/OnePlusOSS/android_kernel_oneplus_sm8650/blob/oneplus/sm8650_u_14.0.0_oneplus12/ [2] https://developer.android.com/studio/test/other-testing-tools/monkey > 4. Better interoperability with userspace memory allocators when > transacting the resource. > > This project puts the same emphasis on the established use case for > servers and the emerging use case for clients so that client workloads > like Android and ChromeOS can leverage the recent multi-sized THPs > [1][2]. > Chapter One introduces the cornerstone of TAO: an abstraction called > policy (virtual) zones, which are overlayed on the physical zones. > This is in line with item 1 above. > > A new door is open after Chapter One. The following two chapters > discuss the reverse of THP collapsing, called THP shattering, and THP > HVO, which brings the hugeTLB feature [3] to THP. They are in line > with items 2 & 3 above. > > Advanced use cases are discussed in Epilogue, since they require the > cooperation of userspace memory allocators. This is in line with item > 4 above. > > [1] https://lwn.net/Articles/932386/ > [2] https://lwn.net/Articles/937239/ > [3] https://www.kernel.org/doc/html/next/mm/vmemmap_dedup.html > > Yu Zhao (4): > THP zones: the use cases of policy zones > THP shattering: the reverse of collapsing > THP HVO: bring the hugeTLB feature to THP > Profile-Guided Heap Optimization and THP fungibility Thanks Barry