Hi, On Tue, Feb 20, 2024 at 03:07:22PM +0100, David Hildenbrand wrote: > > > > > > With large folios in place, we'd likely want to investigate not working on > > > individual pages, but on (possibly large) folios instead. > > > > Yes, that would be interesting. Since the backend has no way of controlling > > what tag storage page will be needed for tags, and subsequently dropped > > from the cache, we would have to figure out what to do if one of the pages > > that is part of a large folio is dropped. The easiest solution that I can > > see is to remove the entire folio from the cleancache, but that would mean > > also dropping the rest of the pages from the folio unnecessarily. > > Right, but likely that won't be an issue. Things get interesting when > thinking about an efficient allocation approach. Indeed. > > > > > > > > > > > > > > I believe this is a very good fit for tag storage reuse, because it allows > > > > tag storage to be allocated even in atomic contexts, which enables MTE in > > > > the kernel. As a bonus, all of the changes to MM from the current approach > > > > wouldn't be needed, as tag storage allocation can be handled entirely in > > > > set_ptes_at(), copy_*highpage() or arch_swap_restore(). > > > > > > > > Is this a viable approach that would be upstreamable? Are there other > > > > solutions that I haven't considered? I'm very much open to any alternatives > > > > that would make tag storage reuse viable. > > > > > > As raised recently, I had similar ideas with something like virtio-mem in > > > the past (wanted to call it virtio-tmem back then), but didn't have time to > > > look into it yet. > > > > > > I considered both, using special device memory as "cleancache" backend, and > > > using it as backend storage for something similar to zswap. We would not > > > need a memmap/"struct page" for that special device memory, which reduces > > > memory overhead and makes "adding more memory" a more reliable operation. > > > > Hm... this might not work with tag storage memory, the kernel needs to > > perform cache maintenance on the memory when it transitions to and from > > storing tags and storing data, so the memory must be mapped by the kernel. > > The direct map will definitely be required I think (copy in/out data). But > memmap for tag memory will likely not be required. Of course, it depends how > to manage tag storage. Likely we have to store some metadata, hopefully we > can avoid the full memmap and just use something else. So I guess instead of ZONE_DEVICE I should try to use arch_add_memory() directly? That has the limitation that it cannot be used by a driver (symbol not exported to modules). Thanks, Alex