Re: [LSFMM] automating measuring memory fragmentation

Yu Zhao <yuzhao@xxxxxxxxxx> · Thu, 16 May 2024 15:36:57 -0600

On Thu, May 16, 2024 at 3:32 PM Karim Manaouil <kmanaouil.dev@xxxxxxxxx> wrote:
>
> On Thu, May 16, 2024 at 02:05:24PM -0600, Yu Zhao wrote:
> > For example, if we have two systems, one has lower fragmentation for
> > some orders but higher fragmentation for the rest, and the other is
> > the opposite. How would we be able to use a single measure to describe
> > this? IOW, I don't think a single measurement can describe all orders
> > in a comparable way, which would be the weakest requirement we would
> > have to impose.
>
> > As I (badly) explained earlier, a single value can't do that because
> > different orders are not on the same footing (or so to speak), unless
> > we are only interested in one non-zero order. So we would need
> > fragmentation_index[NR_non_zero_orders].
>
> > No, for example, A can allocate 4 order-1 but 0 order-2, and B can
> > allocate 2 order-1 *or* 1 order-2, which one would you say is better
> > or worse? This, IMO, depends on which order you are trying to
> > allocate. Does it make sense?
>
> But higher order pages can always be broken down into lower order pages.
> However, the inverse is not always gauranteed (they may not be buddies,
> or compaction/reclaim isn't helpful).

Please read my example again, carefully.

> Obviously, I would rather have one order-4 page than two order-3 pages.
> You can always satisfy an allocation for an order n if a page with an
> order higher than n is available.
>
> One way to measure fragmentation is to compare how far we are from some
> perfect value. The perfect value represents the case when all the free
> memory is available as blocks of pageblock_order or MAX_PAGE_ORDER.
>
> I can do this as a one shot calculation, for example with
>
> static void estimate_numa_fragmentation(void)
> {
>         pg_data_t *pgdat;
>         struct zone *z;
>         unsigned long fragscore;
>         unsigned long bestscore;
>         unsigned long nr_free;
>         int order;
>
>         for_each_online_pgdat(pgdat) {
>                 nr_free = fragscore = 0;
>                 z = pgdat->node_zones;
>                 while (z < (pgdat->node_zones + pgdat->nr_zones)) {
>                         if (!populated_zone(z)) {
>                                 z++;
>                                 continue;
>                         }
>                         spin_lock_irq(&z->lock);
>                         for (order = 0; order < NR_PAGE_ORDERS; order++) {
>                                 nr_free += z->free_area[order].nr_free << order;
>                                 fragscore += z->free_area[order].nr_free << (order * 2);
>                         }
>                         spin_unlock_irq(&z->lock);
>                         z++;
>                         cond_resched();
>                 }
>                 bestscore = nr_free << MAX_PAGE_ORDER;
>                 fragscore = ((bestscore - fragscore) * 100) / bestscore;
>                 pr_info("fragscore on node %d: %lu\n", pgdat->node_id, fragscore);
>         }
> }
>
> But there must be a way to streamline the calculation and update the value
> with low overhead over time.
>
> Cheers
> Karim
> PhD Student
> Edinburgh University