On 19/03/2024 09:20, Huang, Ying wrote: > Ryan Roberts <ryan.roberts@xxxxxxx> writes: > >>>>> I agree phones are not the only platform. But Rome wasn't built in a >>>>> day. I can only get >>>>> started on a hardware which I can easily reach and have enough hardware/test >>>>> resources on it. So we may take the first step which can be applied on >>>>> a real product >>>>> and improve its performance, and step by step, we broaden it and make it >>>>> widely useful to various areas in which I can't reach :-) >>>> >>>> We must guarantee the normal swap path runs correctly and has no >>>> performance regression when developing SWP_SYNCHRONOUS_IO optimization. >>>> So we have to put some effort on the normal path test anyway. >>>> >>>>> so probably we can have a sysfs "enable" entry with default "n" or >>>>> have a maximum >>>>> swap-in order as Ryan's suggestion [1] at the beginning, >>>>> >>>>> " >>>>> So in the common case, swap-in will pull in the same size of folio as was >>>>> swapped-out. Is that definitely the right policy for all folio sizes? Certainly >>>>> it makes sense for "small" large folios (e.g. up to 64K IMHO). But I'm not sure >>>>> it makes sense for 2M THP; As the size increases the chances of actually needing >>>>> all of the folio reduces so chances are we are wasting IO. There are similar >>>>> arguments for CoW, where we currently copy 1 page per fault - it probably makes >>>>> sense to copy the whole folio up to a certain size. >>>>> " >> >> I thought about this a bit more. No clear conclusions, but hoped this might help >> the discussion around policy: >> >> The decision about the size of the THP is made at first fault, with some help >> from user space and in future we might make decisions to split based on >> munmap/mremap/etc hints. In an ideal world, the fact that we have had to swap >> the THP out at some point in its lifetime should not impact on its size. It's >> just being moved around in the system and the reason for our original decision >> should still hold. >> >> So from that PoV, it would be good to swap-in to the same size that was >> swapped-out. > > Sorry, I don't agree with this. It's better to swap-in and swap-out in > smallest size if the page is only accessed seldom to avoid to waste > memory. If we want to optimize only for memory consumption, I'm sure there are many things we would do differently. We need to find a balance between memory and performance. The benefits of folios are well documented and the kernel is heading in the direction of managing memory in variable-sized blocks. So I don't think it's as simple as saying we should always swap-in the smallest possible amount of memory. You also said we should swap *out* in smallest size possible. Have I misunderstood you? I thought the case for swapping-out a whole folio without splitting was well established and non-controversial? > >> But we only kind-of keep that information around, via the swap >> entry contiguity and alignment. With that scheme it is possible that multiple >> virtually adjacent but not physically contiguous folios get swapped-out to >> adjacent swap slot ranges and then they would be swapped-in to a single, larger >> folio. This is not ideal, and I think it would be valuable to try to maintain >> the original folio size information with the swap slot. One way to do this would >> be to store the original order for which the cluster was allocated in the >> cluster. Then we at least know that a given swap slot is either for a folio of >> that order or an order-0 folio (due to cluster exhaustion/scanning). Can we >> steal a bit from swap_map to determine which case it is? Or are there better >> approaches? > > [snip] > > -- > Best Regards, > Huang, Ying