Re: [LSFMM] automating measuring memory fragmentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 16, 2024 at 02:05:24PM -0600, Yu Zhao wrote: 
> For example, if we have two systems, one has lower fragmentation for
> some orders but higher fragmentation for the rest, and the other is
> the opposite. How would we be able to use a single measure to describe
> this? IOW, I don't think a single measurement can describe all orders
> in a comparable way, which would be the weakest requirement we would
> have to impose.

> As I (badly) explained earlier, a single value can't do that because
> different orders are not on the same footing (or so to speak), unless
> we are only interested in one non-zero order. So we would need
> fragmentation_index[NR_non_zero_orders].

> No, for example, A can allocate 4 order-1 but 0 order-2, and B can
> allocate 2 order-1 *or* 1 order-2, which one would you say is better
> or worse? This, IMO, depends on which order you are trying to
> allocate. Does it make sense?

But higher order pages can always be broken down into lower order pages.
However, the inverse is not always gauranteed (they may not be buddies,
or compaction/reclaim isn't helpful).

Obviously, I would rather have one order-4 page than two order-3 pages.
You can always satisfy an allocation for an order n if a page with an
order higher than n is available.

One way to measure fragmentation is to compare how far we are from some 
perfect value. The perfect value represents the case when all the free
memory is available as blocks of pageblock_order or MAX_PAGE_ORDER.

I can do this as a one shot calculation, for example with

static void estimate_numa_fragmentation(void)
{
	pg_data_t *pgdat;
	struct zone *z;
	unsigned long fragscore;
	unsigned long bestscore;
	unsigned long nr_free;
	int order;

	for_each_online_pgdat(pgdat) {
		nr_free = fragscore = 0;
		z = pgdat->node_zones;
		while (z < (pgdat->node_zones + pgdat->nr_zones)) {
			if (!populated_zone(z)) {
				z++;
				continue;
			}
			spin_lock_irq(&z->lock);
			for (order = 0; order < NR_PAGE_ORDERS; order++) {
				nr_free += z->free_area[order].nr_free << order;
				fragscore += z->free_area[order].nr_free << (order * 2);
			}
			spin_unlock_irq(&z->lock);
			z++;
			cond_resched();
		}
		bestscore = nr_free << MAX_PAGE_ORDER;
		fragscore = ((bestscore - fragscore) * 100) / bestscore;
		pr_info("fragscore on node %d: %lu\n", pgdat->node_id, fragscore);
	}
}

But there must be a way to streamline the calculation and update the value
with low overhead over time.

Cheers
Karim
PhD Student
Edinburgh University




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux