On Thu, Jul 4, 2024 at 1:42 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Barry Song <21cnbao@xxxxxxxxx> writes: > > > On Wed, Jul 3, 2024 at 6:33 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> > > > > Ying, thanks! > > > >> Barry Song <21cnbao@xxxxxxxxx> writes: > > [snip] > > >> > This patch introduces mTHP swap-in support. For now, we limit mTHP > >> > swap-ins to contiguous swaps that were likely swapped out from mTHP as > >> > a whole. > >> > > >> > Additionally, the current implementation only covers the SWAP_SYNCHRONOUS > >> > case. This is the simplest and most common use case, benefiting millions > >> > >> I admit that Android is an important target platform of Linux kernel. > >> But I will not advocate that it's MOST common ... > > > > Okay, I understand that there are still many embedded systems similar > > to Android, even if > > they are not Android :-) > > > >> > >> > of Android phones and similar devices with minimal implementation > >> > cost. In this straightforward scenario, large folios are always exclusive, > >> > eliminating the need to handle complex rmap and swapcache issues. > >> > > >> > It offers several benefits: > >> > 1. Enables bidirectional mTHP swapping, allowing retrieval of mTHP after > >> > swap-out and swap-in. > >> > 2. Eliminates fragmentation in swap slots and supports successful THP_SWPOUT > >> > without fragmentation. Based on the observed data [1] on Chris's and Ryan's > >> > THP swap allocation optimization, aligned swap-in plays a crucial role > >> > in the success of THP_SWPOUT. > >> > 3. Enables zRAM/zsmalloc to compress and decompress mTHP, reducing CPU usage > >> > and enhancing compression ratios significantly. We have another patchset > >> > to enable mTHP compression and decompression in zsmalloc/zRAM[2]. > >> > > >> > Using the readahead mechanism to decide whether to swap in mTHP doesn't seem > >> > to be an optimal approach. There's a critical distinction between pagecache > >> > and anonymous pages: pagecache can be evicted and later retrieved from disk, > >> > potentially becoming a mTHP upon retrieval, whereas anonymous pages must > >> > always reside in memory or swapfile. If we swap in small folios and identify > >> > adjacent memory suitable for swapping in as mTHP, those pages that have been > >> > converted to small folios may never transition to mTHP. The process of > >> > converting mTHP into small folios remains irreversible. This introduces > >> > the risk of losing all mTHP through several swap-out and swap-in cycles, > >> > let alone losing the benefits of defragmentation, improved compression > >> > ratios, and reduced CPU usage based on mTHP compression/decompression. > >> > >> I understand that the most optimal policy in your use cases may be > >> always swapping-in mTHP in highest order. But, it may be not in some > >> other use cases. For example, relative slow swap devices, non-fault > >> sub-pages swapped out again before usage, etc. > >> > >> So, IMO, the default policy should be the one that can adapt to the > >> requirements automatically. For example, if most non-fault sub-pages > >> will be read/written before being swapped out again, we should swap-in > >> in larger order, otherwise in smaller order. Swap readahead is one > >> possible way to do that. But, I admit that this may not work perfectly > >> in your use cases. > >> > >> Previously I hope that we can start with this automatic policy that > >> helps everyone, then check whether it can satisfy your requirements > >> before implementing the optimal policy for you. But it appears that you > >> don't agree with this. > >> > >> Based on the above, IMO, we should not use your policy as default at > >> least for now. A user space interface can be implemented to select > >> different swap-in order policy similar as that of mTHP allocation order > >> policy. We need a different policy because the performance characters > >> of the memory allocation is quite different from that of swap-in. For > >> example, the SSD reading could be much slower than the memory > >> allocation. With the policy selection, I think that we can implement > >> mTHP swap-in for non-SWAP_SYNCHRONOUS too. Users need to know what they > >> are doing. > > > > Agreed. Ryan also suggested something similar before. > > Could we add this user policy by: > > > > /sys/kernel/mm/transparent_hugepage/hugepages-<size>/swapin_enabled > > which could be 0 or 1, I assume we don't need so many "always inherit > > madvise never"? > > > > Do you have any suggestions regarding the user interface? > > /sys/kernel/mm/transparent_hugepage/hugepages-<size>/swapin_enabled > > looks good to me. To be consistent with "enabled" in the same > directory, and more importantly, to be extensible, I think that it's > better to start with at least "always never". I believe that we will > add "auto" in the future to tune automatically. Which can be used as > default finally. Sounds good to me. Thanks! > > -- > Best Regards, > Huang, Ying Barry