Hi, I'd like to propose a session about the allocation and reclamation of mTHP. This is related to Yu Zhao's TAO[1] but not the same. OPPO has implemented mTHP-like large folios across thousands of genuine Android devices, utilizing ARM64 CONT-PTE. However, we've encountered challenges: - The allocation of mTHP isn't consistently reliable; even after prolonged use, obtaining large folios remains uncertain. As an instance, following a few hours of operation, the likelihood of successfully allocating large folios on a phone may decrease to just 2%. - Mixing large and small folios in the same LRU list can lead to mutual blocking and unpredictable latency during reclamation/allocation. For instance, if you require large folios, the LRU list's tail could be filled with small folios. LRU(LF- large folio, SF- small folio): LF - LF - LF - SF - SF - SF - SF - SF - SF -SF - SF - SF - SF - SF - SF - SF You might end up reclaiming many small folios yet still struggle to allocate large folios. Conversely, the inverse scenario can occur when the LRU list's tail is populated with large folios. SF - SF - SF - LF - LF - LF - LF - LF - LF -LF - LF - LF - LF - LF - LF - LF In OPPO's products, we allocate dedicated pageblocks solely for large folios allocation, and we've fine-tuned the LRU mechanism to support dual LRU—one for small folios and another for large ones. Dedicated page blocks offer a fundamental guarantee of allocating large folios. Additionally, segregating small and large folios into two LRUs ensures that both can be efficiently reclaimed for their respective users' requests. However, while the implementation may lack aesthetic appeal and is primarily tailored for product purposes, it isn't fully upstreamable. You can obtain the architectural diagram of OPPO's approach from link[2]. Therefore, my plan is to present: - Introduce the architecture of OPPO's mTHP-like approach, which encompasses additional optimizations we've made to address swap fragmentation issues and improve swap performance, such as dual-zRAM and compression/decompression of large folios [3]. - Present OPPO's method of utilizing dedicated page blocks and a dual-LRU system for mTHP. - Share our observations from employing Yu Zhao's TAO on Pixel 6 phones. - Discuss our future direction—are we leaning towards TAO or dedicated page blocks? If we opt for page blocks, how do we plan to resolve the LRU issue? [1] https://lore.kernel.org/linux-mm/20240229183436.4110845-1-yuzhao@xxxxxxxxxx/ [2] https://github.com/21cnbao/mTHP/blob/main/largefoliosarch.png [3] https://lore.kernel.org/linux-mm/20240327214816.31191-1-21cnbao@xxxxxxxxx/ Thanks, Barry