On Thu, May 16, 2024 at 3:32 PM Karim Manaouil <kmanaouil.dev@xxxxxxxxx> wrote: > > On Thu, May 16, 2024 at 02:05:24PM -0600, Yu Zhao wrote: > > For example, if we have two systems, one has lower fragmentation for > > some orders but higher fragmentation for the rest, and the other is > > the opposite. How would we be able to use a single measure to describe > > this? IOW, I don't think a single measurement can describe all orders > > in a comparable way, which would be the weakest requirement we would > > have to impose. > > > As I (badly) explained earlier, a single value can't do that because > > different orders are not on the same footing (or so to speak), unless > > we are only interested in one non-zero order. So we would need > > fragmentation_index[NR_non_zero_orders]. > > > No, for example, A can allocate 4 order-1 but 0 order-2, and B can > > allocate 2 order-1 *or* 1 order-2, which one would you say is better > > or worse? This, IMO, depends on which order you are trying to > > allocate. Does it make sense? > > But higher order pages can always be broken down into lower order pages. > However, the inverse is not always gauranteed (they may not be buddies, > or compaction/reclaim isn't helpful). Please read my example again, carefully. > Obviously, I would rather have one order-4 page than two order-3 pages. > You can always satisfy an allocation for an order n if a page with an > order higher than n is available. > > One way to measure fragmentation is to compare how far we are from some > perfect value. The perfect value represents the case when all the free > memory is available as blocks of pageblock_order or MAX_PAGE_ORDER. > > I can do this as a one shot calculation, for example with > > static void estimate_numa_fragmentation(void) > { > pg_data_t *pgdat; > struct zone *z; > unsigned long fragscore; > unsigned long bestscore; > unsigned long nr_free; > int order; > > for_each_online_pgdat(pgdat) { > nr_free = fragscore = 0; > z = pgdat->node_zones; > while (z < (pgdat->node_zones + pgdat->nr_zones)) { > if (!populated_zone(z)) { > z++; > continue; > } > spin_lock_irq(&z->lock); > for (order = 0; order < NR_PAGE_ORDERS; order++) { > nr_free += z->free_area[order].nr_free << order; > fragscore += z->free_area[order].nr_free << (order * 2); > } > spin_unlock_irq(&z->lock); > z++; > cond_resched(); > } > bestscore = nr_free << MAX_PAGE_ORDER; > fragscore = ((bestscore - fragscore) * 100) / bestscore; > pr_info("fragscore on node %d: %lu\n", pgdat->node_id, fragscore); > } > } > > But there must be a way to streamline the calculation and update the value > with low overhead over time. > > Cheers > Karim > PhD Student > Edinburgh University