ID stashing seems to cause page->lru.next (aliased to page->pp_magic) to be in the mmap'able space for some arches. > >> It seems it is not really only about kernel pointers as round_hint_to_min() >> in mm/mmap.c suggests and the commit log in the above commit 8a5e5e02fc83 >> if I understand it correctly: >> "Given unprivileged users cannot mmap anything below mmap_min_addr, it >> should be safe to use poison pointers lower than mmap_min_addr." >> >> And the above "making sure the top-most bit is always 0" doesn't seems to >> ensure page->signature to be outside the range used for kernel pointers >> for 32-bit arches with VMSPLIT_1G defined, see arch/arm/Kconfig, there >> is a similar config for x86 too: ... > > Ah, interesting, didn't know this was configurable. Okay, so AFAICT, the > lowest value of PAGE_OFFSET is 0x40000000 (for VMSPLIT_1G), so we need > to leave two bits off at the top instead of just one. Will update this, > and try to explain the logic better in the comment. It seems there was attempt of doing 4G/4G split too, and that is the kind of limitation or complexity added to the ARCH and MM subsystem by doing the ID stashing I mentioned earlier. https://lore.kernel.org/lkml/Pine.LNX.4.44.0307082332450.17252-100000@localhost.localdomain/ > >> IMHO, even if some trick like above is really feasible, it may be >> adding some limitation or complexity to the ARCH and MM subsystem, for >> example, stashing the ID in page->signature may cause 0x*40 signature >> to be unusable for other poison pointer purpose, it makes more sense to >> make it obvious by doing the above trick in some MM header file like >> poison.h instead of in the page_pool subsystem. > > AFAIU, PP_SIGNATURE is used for page_pool to be able to distinguish its > own pages from those allocated elsewhere (cf the description above). > Which means that these definitions are logically page_pool-internal, and > thus it makes the most sense to keep them in the page pool headers. The > only bits the mm subsystem cares about in that field are the bottom two > (for pfmemalloc pages and compound pages). All I asked is about moving PP_MAGIC_MASK macro into poison.h if you still want to proceed with reusing the page->pp_magic as the masking and the signature to be masked seems reasonable to be in the same file. > >>>>> Since all the tracking is performed on DMA map/unmap, no additional code >>>>> is needed in the fast path, meaning the performance overhead of this >>>>> tracking is negligible. A micro-benchmark shows that the total overhead >>>>> of using xarray for this purpose is about 400 ns (39 cycles(tsc) 395.218 >>>>> ns; sum for both map and unmap[1]). Since this cost is only paid on DMA >>>>> map and unmap, it seems like an acceptable cost to fix the late unmap >>>> >>>> For most use cases when PP_FLAG_DMA_MAP is set and IOMMU is off, the >>>> DMA map and unmap operation is almost negligible as said below, so the >>>> cost is about 200% performance degradation, which doesn't seems like an >>>> acceptable cost. >>> >>> I disagree. This only impacts the slow path, as long as pages are >>> recycled there is no additional cost. While your patch series has >>> demonstrated that it is *possible* to reduce the cost even in the slow >>> path, I don't think the complexity cost of this is worth it. >> >> It is still the datapath otherwise there isn't a specific testing >> for that use case, more than 200% performance degradation is too much >> IHMO. > > Do you have a real-world use case (i.e., a networking benchmark, not a > micro-benchmark of the allocation function) where this change has a > measurable impact on performance? If so, please do share the numbers! I don't have one yet. > > I very much doubt it will be anything close to that magnitude, but I'm > always willing to be persuaded by data :) > >> Let aside the above severe performance degradation, reusing space in >> page->signature seems to be a tradeoff between adding complexity in >> page_pool subsystem and in VM/ARCH subsystem as mentioned above. > > I think you are overstating the impact on other MM users; this is all > mostly page_pool-internal logic, cf the explanation above. To be honest, I am not that familiar with the pointer poison mechanism. But through some researching and analyzing, it makes sense to overstate it a little as it seems to be security-related. Cc'ed some security-related experts and ML to see if there is some clarifying from them. > >>> >>> [...] >>> >>>>> The extra memory needed to track the pages is neatly encapsulated inside >>>>> xarray, which uses the 'struct xa_node' structure to track items. This >>>>> structure is 576 bytes long, with slots for 64 items, meaning that a >>>>> full node occurs only 9 bytes of overhead per slot it tracks (in >>>>> practice, it probably won't be this efficient, but in any case it should >>>> >>>> Is there any debug infrastructure to know if it is not this efficient? >>>> as there may be 576 byte overhead for a page for the worst case. >>> >>> There's an XA_DEBUG define which enables some dump functions, but I >>> don't think there's any API to inspect the memory usage. I guess you >>> could attach a BPF program and walk the structure, or something? >>> >> >> It seems the XA_DEBUG is not defined in production environment. >> And I am not familiar enough with BPF program to understand if the >> BPF way is feasiable in production environment. >> Any example for the above BPF program and how to attach it? > > Hmm, no, not really, sorry :( > > I *think* it should be possible to write a bpftrace script that walks > the internal xarray tree and counts the nodes along the way, but it's > not trivial to do, and I haven't tried. > > -Toke > >