Re: [PATCH RFC 0/4] mm: KUnit tests for the page allocator

Brendan Jackman <jackmanb@xxxxxxxxxx> · Tue, 25 Feb 2025 12:56:32 +0000

On Tue, Feb 25, 2025 at 11:01:47AM +0100, David Hildenbrand wrote:
> > This is an RFC and not a PATCH because:
> > 
> > 1. I have not taken much care to ensure the isolation is complete.
> >     There are probably sources of flakiness and nondeterminism in here.
> > 
> > 2. I suspect the the basic idea might be over-complicated: do we really
> >     need memory hotplug here? Do we even need the instance of the
> >     allocator we're testing to actual memory behind the pages it's
> >     allocating, or could we just hallucinate a new region of vmemmap
> >     without any of that awkwardness?
> > 
> >     One significant downside of relying on memory hotplug is that the
> >     test won't run if we can't hotplug anything out. That means you have
> >     to fiddle with the platform to even run the tests - see the
> >     --kernel_args and --qemu_args I had to add to my kunit.py command
> >     above.
> > 
> >     So yeah, other suggestions welcome.
> > 
> >     2b. I'm not very confident I'm using the hotplug API properly.
> 
> Me neither ;)
> 
> Dynamically adding memory to that "fake" node is certainly interesting, but
> which guarantees do we have that some other features (page migration, memory
> offlining, page reporting ...) don't interact in weird ways with this "fake"
> node? Adding special-casing all over the place for that feels wrong. I
> assume this is point 1. you note above.

Yeah, basically this is the big downside. Changing the system we're
trying to test in order to make it testable can't be avoided entirely,
but I am also pretty unhappy with sprinkling "if (node_isolated(node))"
everywhere.

I guess the ideal approach is one where, instead of having to modify
the meaning of node_data, we just support replacing it completely,
e.g:

struct page *__alloc_frozen_pages_noprof(gfp_t gfp, unsigned int order,
		int preferred_nid, nodemask_t *nodemask, 
		struct pagelist_data *node_data)
{
	struct alloc_context ac = { .node_data = node_data };

	// ...
}

Ideally this could be done in such a way that it disappears completely
outside of KUnit builds, the interface should be private and we'd
wanna get rid of any unnecessary pointer chasing with stuff like:

#ifdef CONFIG_PAGE_ALLOC_KUNIT_TESTS
static inline struct ac_node_data(struct alloc_context *ac, int node)
{
	return ac->node_data[node];
}
#else
#define ac_node_data(ac, node) (NODE_DATA(node))
#endif

I initially rejected this approach because it felt "too intrusive",
but now that I've actually written this RFC I think it could be less
intrusive than the node_isolated() thing I've proposed here.

The most obvious challenges I can see there are:

- There might be places that page_alloc.c calls out to that care about
  node_data but where we really don't want to plumb the alloc_context
  through (maybe vmscan.c is already such a place)?

- I dunno how many more such helpers we'd need beyond ac_node_data(),
  like do we need ac_nodes_possible_mask() etc etc etc?

But maybe worth a try - can you see any obvious reason this idea is
stupid?

> So I don't quite love the idea on first sight ... but I haven't grasped all
> details of the full picture yet I'm afraid.

Do you have any thoughts about "making up" memory instead of
hot-unplugging real memory for test usage? That might simplify things
a bit?

It seems possible that very little mm code cares if the memory we're
managing actually exists. (For ASI code we did briefly experiment with
tracking information about free pages in the page itself, but it's
pretty sketchy and the presence of debug_pagealloc makes me think
nobody does it today).

There might be arch-specific issues there, but for unit tests it
seems OK if they don't work on every ISA.