Re: [PATCH 1/5] mm: allow arch refinement/skip for vmap alloc

Maxwell Bland <mbland@xxxxxxxxxxxx> · Thu, 18 Apr 2024 15:52:48 +0000

On Thu, April 18, 2024 at 3:55 AM, Uladzislau Rezki wrote:
> On Tue, Apr 02, 2024 at 03:15:01PM -0500, Maxwell Bland wrote:
> > +extern void insert_vmap_area_augment(struct vmap_area *va, struct rb_node
> > +extern int va_clip(struct rb_root *root, struct list_head *head, +extern
> > struct vmap_area *__find_vmap_area(unsigned long addr,
> To me it looks like you want to make internal functions as public for
> everyone which is not good, imho.

First, thank you for the feedback. I tussled with some of these ideas too while
writing. I will clarify some motivations below and then propose some
alternatives based upon your review.

> arch_skip_va() injections into the search algorithm sounds like a hack and
> might lead(if i do not miss something, need to check closer) to alloc
> failures when we go toward a reserved VA but we are not allowed to allocate
> from.

This is a good insight into the architectural intention here. As is clear, the
underlying goal of this patch is to provide a method for architectures to
enforce their own pseudo-reserved vmalloc regions dynamically.

This considered, the highlighted potential failures would technically be
legitimate with the caveat of making architectures who implement the interface
responsible for maintaining only correct and appropriate reservations?

If so, then the path diverges conditioned on whether we believe that caveat is
reasonable. I am on the fence about whether freedom is good here, so I think it
is reasonable to disallow this freedom, see below.

> Why do not you allocate just using a specific range from MODULES_ASLR_START
> till VMALLOC_END?

Mark Rutland has indicated that he does not support a large free region size
reduction in favor of ensuring pages are not interleaved. That is, this was my
initial approach, but it was deemed unfit. Strict partitioning creates a
trade-off between region size and ASLR randomization.

To clarify a secondary point, in case this question was more general: allowing
interleaving between VMALLOC_START to VMALLOC_END and MODULES_ASLR_START to
MODULES_ASLR_END regions breaks a key usecase of being able to enforce new
PMD-level and coarse-grained protections (e.g. PXNTable) dynamically.

In case the question is more of a "why are you submitting this in the first
place": non-interleaving simplifies code focused on preventing malicious page
table updates since we do not need to track all updates of PTE level
descriptors. Verifying individual PTE updates comes at a high (performance,
complexity) cost and happens to lead to hardware-level privilege-checking race
conditions on certain very popular arm64 chipsets.

OK, preamble out of the way:

(1) Would it be OK to potentially export a more generic version of the
functions written in arch/arm64/kernel/vmalloc.c for

https://lore.kernel.org/all/20240416122254.868007168-3-mbland@xxxxxxxxxxxx/

That is, move a version of these functions to the main vmalloc.c? This way
these functions are still owned by the right part of the kernel.

Or (2) the exported functions could be duplicated, effectively, into
architecture-specific code, a sort of "all in" to the caveat mentioned above of
making the architectures responsible for maintaining a reserved code region if
they choose to implement the interface.

(3) Potentially a different approach that does not involve skipping the
allocation of "bad" VA's but instead dynamically restructures the tree,
potentially just creating two trees, one for data and one for code, is in mind.

Thanks and Regards,
Maxwell Bland