On Wed, Sep 06, 2023 at 12:23:21PM +0100, Alexandru Elisei wrote: > On Thu, Aug 24, 2023 at 04:24:30PM +0100, Catalin Marinas wrote: > > On Thu, Aug 24, 2023 at 01:25:41PM +0200, David Hildenbrand wrote: > > > On 24.08.23 13:06, David Hildenbrand wrote: > > > > Regarding one complication: "The kernel needs to know where to allocate > > > > a PROT_MTE page from or migrate a current page if it becomes PROT_MTE > > > > (mprotect()) and the range it is in does not support tagging.", > > > > simplified handling would be if it's in a MIGRATE_CMA pageblock, it > > > > doesn't support tagging. You have to migrate to a !CMA page (for > > > > example, not specifying GFP_MOVABLE as a quick way to achieve that). > > > > > > Okay, I now realize that this patch set effectively duplicates some CMA > > > behavior using a new migrate-type. [...] > I considered mixing the tag storage memory memory with normal memory and > adding it to MIGRATE_CMA. But since tag storage memory cannot be tagged, > this means that it's not enough anymore to have a __GFP_MOVABLE allocation > request to use MIGRATE_CMA. > > I considered two solutions to this problem: > > 1. Only allocate from MIGRATE_CMA is the requested memory is not tagged => > this effectively means transforming all memory from MIGRATE_CMA into the > MIGRATE_METADATA migratetype that the series introduces. Not very > appealing, because that means treating normal memory that is also on the > MIGRATE_CMA lists as tagged memory. That's indeed not ideal. We could try this if it makes the patches significantly simpler, though I'm not so sure. Allocating metadata is the easier part as we know the correspondence from the tagged pages (32 PROT_MTE page) to the metadata page (1 tag storage page), so alloc_contig_range() does this for us. Just adding it to the CMA range is sufficient. However, making sure that we don't allocate PROT_MTE pages from the metadata range is what led us to another migrate type. I guess we could achieve something similar with a new zone or a CPU-less NUMA node, though the latter is not guaranteed not to allocate memory from the range, only make it less likely. Both these options are less flexible in terms of size/alignment/placement. Maybe as a quick hack - only allow PROT_MTE from ZONE_NORMAL and configure the metadata range in ZONE_MOVABLE but at some point I'd expect some CXL-attached memory to support MTE with additional carveout reserved. To recap, in this series, a PROT_MTE page allocation starts with a typical allocation from anywhere other than MIGRATE_METADATA followed by the hooks to reserve the corresponding metadata range at (pfn * 128 + offset) for a 4K page. The whole metadata page is reserved, so the adjacent 31 pages around the original allocation can also be mapped as PROT_MTE. (Peter and Evgenii @ Google had a slightly different approach in their prototype: separate free_area[] array for PROT_MTE pages; while it has some advantages, I found it more intrusive since the same page can be on a free_area/free_list or another) > 2. Keep track of which pages are tag storage at page granularity (either by > a page flag, or by checking that the pfn falls in one of the tag storage > region, or by some other mechanism). When the page allocator takes free > pages from the MIGRATE_METADATA list to satisfy an allocation, compare the > gfp mask with the page type, and if the allocation is tagged and the page > is a tag storage page, put it back at the tail of the free list and choose > the next page. Repeat until the page allocator finds a normal memory page > that can be tagged (some refinements obviously needed to need to avoid > infinite loops). With large enough CMA areas, there's a real risk of latency spikes, RCU stalls etc. Not really keen on such heuristics. -- Catalin