On Sat, Mar 16, 2024 at 1:06 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > On 15/03/2024 10:01, Barry Song wrote: > > On Fri, Mar 15, 2024 at 10:17 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> > >> Barry Song <21cnbao@xxxxxxxxx> writes: > >> > >>> On Fri, Mar 15, 2024 at 9:43 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >>>> > >>>> Barry Song <21cnbao@xxxxxxxxx> writes: > >>>> > >>>>> From: Chuanhua Han <hanchuanhua@xxxxxxxx> > >>>>> > >>>>> On an embedded system like Android, more than half of anon memory is > >>>>> actually in swap devices such as zRAM. For example, while an app is > >>>>> switched to background, its most memory might be swapped-out. > >>>>> > >>>>> Now we have mTHP features, unfortunately, if we don't support large folios > >>>>> swap-in, once those large folios are swapped-out, we immediately lose the > >>>>> performance gain we can get through large folios and hardware optimization > >>>>> such as CONT-PTE. > >>>>> > >>>>> This patch brings up mTHP swap-in support. Right now, we limit mTHP swap-in > >>>>> to those contiguous swaps which were likely swapped out from mTHP as a > >>>>> whole. > >>>>> > >>>>> Meanwhile, the current implementation only covers the SWAP_SYCHRONOUS > >>>>> case. It doesn't support swapin_readahead as large folios yet since this > >>>>> kind of shared memory is much less than memory mapped by single process. > >>>> > >>>> In contrast, I still think that it's better to start with normal swap-in > >>>> path, then expand to SWAP_SYCHRONOUS case. > >>> > >>> I'd rather try the reverse direction as non-sync anon memory is only around > >>> 3% in a phone as my observation. > >> > >> Phone is not the only platform that Linux is running on. > > > > I suppose it's generally true that forked shared anonymous pages only > > constitute a > > small portion of all anonymous pages. The majority of anonymous pages are within > > a single process. > > > > I agree phones are not the only platform. But Rome wasn't built in a > > day. I can only get > > started on a hardware which I can easily reach and have enough hardware/test > > resources on it. So we may take the first step which can be applied on > > a real product > > and improve its performance, and step by step, we broaden it and make it > > widely useful to various areas in which I can't reach :-) > > > > so probably we can have a sysfs "enable" entry with default "n" or > > have a maximum > > swap-in order as Ryan's suggestion [1] at the beginning, > > I wasn't neccessarily suggesting that we should hard-code an upper limit. I was > just pointing out that we likely need some policy somewhere because the right > thing very likely depends on the folio size and workload. And there is likely > similar policy needed for CoW. > > We already have per-thp-size directories in sysfs, so there is a natural place > to add new controls as you suggest - that would fit well. Of course if we can > avoid exposing yet more controls that would be preferable. > > > > > " > > So in the common case, swap-in will pull in the same size of folio as was > > swapped-out. Is that definitely the right policy for all folio sizes? Certainly > > it makes sense for "small" large folios (e.g. up to 64K IMHO). But I'm not sure > > it makes sense for 2M THP; As the size increases the chances of actually needing > > all of the folio reduces so chances are we are wasting IO. There are similar > > arguments for CoW, where we currently copy 1 page per fault - it probably makes > > sense to copy the whole folio up to a certain size. > > " right now we have an "enable" entry in each size, for example: /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enable for the phone case, it would be quite simple, just enable 64KiB(or +16KiB) and allow swap-in 64KiB(or +16KiB) folios, so it doesn't need any new controls since do_swap_page does the same checks as do_anonymous_page() does. And we actually have deployed 64KiB-only swap-out and swap-in on millions of real phones. Considering other users scenarios which might want larger folios such as 2MiB 1MiB but only want smaller swap-in folio sizes, I could have a new swapin control like, /sys/kernel/mm/transparent_hugepage/hugepages-64kB/swapin this can be 1 or 0. With this, it seems safer for the patchset to land while I don't have the ability to extensively test it on Linux servers? Thanks Barry