I believe this is a very good fit for tag storage reuse, because it allows
tag storage to be allocated even in atomic contexts, which enables MTE in
the kernel. As a bonus, all of the changes to MM from the current approach
wouldn't be needed, as tag storage allocation can be handled entirely in
set_ptes_at(), copy_*highpage() or arch_swap_restore().
Is this a viable approach that would be upstreamable? Are there other
solutions that I haven't considered? I'm very much open to any alternatives
that would make tag storage reuse viable.
As raised recently, I had similar ideas with something like virtio-mem in
the past (wanted to call it virtio-tmem back then), but didn't have time to
look into it yet.
I considered both, using special device memory as "cleancache" backend, and
using it as backend storage for something similar to zswap. We would not
need a memmap/"struct page" for that special device memory, which reduces
memory overhead and makes "adding more memory" a more reliable operation.
Hm... this might not work with tag storage memory, the kernel needs to
perform cache maintenance on the memory when it transitions to and from
storing tags and storing data, so the memory must be mapped by the kernel.
The direct map will definitely be required I think (copy in/out data). But
memmap for tag memory will likely not be required. Of course, it depends how
to manage tag storage. Likely we have to store some metadata, hopefully we
can avoid the full memmap and just use something else.
So I guess instead of ZONE_DEVICE I should try to use arch_add_memory()
directly? That has the limitation that it cannot be used by a driver
(symbol not exported to modules).
You can certainly start with something simple, and we can work on
removing that memmap allocation later.
Maybe we have to expose new primitives in the context of such drivers.
arch_add_memory() likely also doesn't do what you need.
I recall that we had a way of only messing with the direct map.
Last time I worked with that was in the context of memtrace
(arch/powerpc/platforms/powernv/memtrace.c)
There, we call arch_create_linear_mapping()/arch_remove_linear_mapping().
... and now my memory comes back: we never finished factoring out
arch_create_linear_mapping/arch_remove_linear_mapping so they would be
available on all architectures.
Your driver will be very arm64 specific, so doing it in an arm64-special
way might be good enough initially. For example, the arm64-core could
detect that special memory region and just statically prepare the direct
map and not expose the memory to the buddy/allocate a memmap. Similar to
how we handle the crashkernel/kexec IIRC (we likely do not have a direct
map for that, though; ).
[I was also wondering if we could simply dynamically map/unmap when
required so you can just avoid creating the entire direct map; might bot
be the best approach performance-wise, though]
There are a bunch of details to be sorted out, but I don't consider the
directmap/memmap side of things a big problem.
--
Cheers,
David / dhildenb