arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA

Alexandru Elisei <alexandru.elisei@xxxxxxx> · Tue, 20 Feb 2024 11:26:38 +0000

Hello,

This is a request to discuss alternatives to the current approach for
reusing the MTE tag storage memory for data allocations [1]. Each iteration
of the series uncovered new issues, the latest being that memory allocation
is being performed in atomic contexts [2]; I would like to start a
discussion regarding possible alternative, which would integrate better
with the memory management code.

This is a high level overview of the current approach:

 * Tag storage pages are put on the MIGRATE_CMA lists, meaning they can be
   used for data allocations like (almost) any other page in the system.

 * When a page is allocated as tagged, the corresponding tag storage is
   also allocated.

 * There's a static relationship between a page and the location in memory
   where its tags are stored. Because of this, if the corresponding tag
   storage is used for data, the tag storage page is migrated.

Although this is the most generic approach because tag storage pages are
treated like normal pages, it has some disadvantages:

 * HW KASAN (MTE in the kernel) cannot be used. The kernel allocates memory
   in atomic context, where migration is not possible.

 * Tag storage pages cannot be themselves tagged, and this means that all
   CMA pages, even those which aren't tag storage, cannot be used for
   tagged allocations.

 * Page migration is costly, and a process that uses MTE can experience
   measurable slowdowns if the tag storage it requires is in use for data.
   There might be ways to reduce this cost (by reducing the likelihood that
   tag storage pages are allocated), but it cannot be completely
   eliminated.

 * Worse yet, a userspace process can use a tag storage page in such a way
   that migration is effectively impossible [3],[4].  A malicious process
   can make use of this to prevent the allocation of tag storage for other
   processes in the system, leading to a degraded experience for the
   affected processes. Worst case scenario, progress becomes impossible for
   those processes.

One alternative approach I'm looking at right now is cleancache. Cleancache
was removed in v5.17 (commit 0a4ee518185e) because the only backend, the
tmem driver, had been removed earlier (in v5.3, commit 814bbf49dcd0).

With this approach, MTE tag storage would be implemented as a driver
backend for cleancache. When a tag storage page is needed for storing tags,
the page would simply be dropped from the cache (cleancache_get_page()
returns -1).

I believe this is a very good fit for tag storage reuse, because it allows
tag storage to be allocated even in atomic contexts, which enables MTE in
the kernel. As a bonus, all of the changes to MM from the current approach
wouldn't be needed, as tag storage allocation can be handled entirely in
set_ptes_at(), copy_*highpage() or arch_swap_restore().

Is this a viable approach that would be upstreamable? Are there other
solutions that I haven't considered? I'm very much open to any alternatives
that would make tag storage reuse viable.

[1] https://lore.kernel.org/all/20240125164256.4147-1-alexandru.elisei@xxxxxxx/
[2] https://lore.kernel.org/all/CAMn1gO7M51QtxPxkRO3ogH1zasd2-vErWqoPTqGoPiEvr8Pvcw@xxxxxxxxxxxxxx/
[3] https://lore.kernel.org/linux-trace-kernel/4e7a4054-092c-4e34-ae00-0105d7c9343c@xxxxxxxxxx/
[4] https://lore.kernel.org/linux-trace-kernel/92833873-cd70-44b0-9f34-f4ac11b9e498@xxxxxxxxxx/

Thanks,
Alex