On Fri, Jul 26, 2024 at 12:21 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Chris Li <chrisl@xxxxxxxxxx> writes: > > > On Thu, Jul 25, 2024 at 10:55 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> > >> Chris Li <chrisl@xxxxxxxxxx> writes: > >> > >> > On Thu, Jul 25, 2024 at 7:07 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > >> >> > If the freeing of swap entry is random distribution. You need 16 > >> >> > continuous swap entries free at the same time at aligned 16 base > >> >> > locations. The total number of order 4 free swap space add up together > >> >> > is much lower than the order 0 allocatable swap space. > >> >> > If having one entry free is 50% probability(swapfile half full), then > >> >> > having 16 swap entries is continually free is (0.5) EXP 16 = 1.5 E-5. > >> >> > If the swapfile is 80% full, that number drops to 6.5 E -12. > >> >> > >> >> This depends on workloads. Quite some workloads will show some degree > >> >> of spatial locality. For a workload with no spatial locality at all as > >> >> above, mTHP may be not a good choice at the first place. > >> > > >> > The fragmentation comes from the order 0 entry not from the mTHP. mTHP > >> > have their own valid usage case, and should be separate from how you > >> > use the order 0 entry. That is why I consider this kind of strategy > >> > only works on the lucky case. I would much prefer the strategy that > >> > can guarantee work not depend on luck. > >> > >> It seems that you have some perfect solution. Will learn it when you > >> post it. > > > > No, I don't have perfect solutions. I see puting limit on order 0 swap > > usage and writing out discontinuous swap entries from a folio are more > > deterministic and not depend on luck. Both have their price to pay as > > well. > > > >> > >> >> >> - Order-4 pages need to be swapped out, but no enough order-4 non-full > >> >> >> clusters available. > >> >> > > >> >> > Exactly. > >> >> > > >> >> >> > >> >> >> So, we need a way to migrate non-full clusters among orders to adjust to > >> >> >> the various situations automatically. > >> >> > > >> >> > There is no easy way to migrate swap entries to different locations. > >> >> > That is why I like to have discontiguous swap entries allocation for > >> >> > mTHP. > >> >> > >> >> We suggest to migrate non-full swap clsuters among different lists, not > >> >> swap entries. > >> > > >> > Then you have the down side of reducing the number of total high order > >> > clusters. By chance it is much easier to fragment the cluster than > >> > anti-fragment a cluster. The orders of clusters have a natural > >> > tendency to move down rather than move up, given long enough time of > >> > random access. It will likely run out of high order clusters in the > >> > long run if we don't have any separation of orders. > >> > >> As my example above, you may have almost 0 high-order clusters forever. > >> So, your solution only works for very specific use cases. It's not a > >> general solution. > > > > One simple solution is having an optional limitation of 0 order swap. > > I understand you don't like that option, but there is no other easy > > solution to achieve the same effectiveness, so far. If there is, I > > like to hear it. > > Just as you said, it's optional, so it's not general solution. This may > trigger OOM in general solution. Agree it is not a general solution. This option is simple and useful. The more general solution is just write out discontiguous swap entries. Chris