I had an offline discussion with Yu on this, and he pointed out something I hadn't realized: the x86 cmpxchg instruction always produces a write cycle, even if it doesn't modify the data - it just writes back the original data in that case. So, get_page_unless_zero will always produce a fault on RO mapped page structures on x86. Maybe this was obvious to other people, but I didn't see it explicitly mentioned, so I figured I'd add the datapoint. - Frank On Thu, Jun 6, 2024 at 1:30 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > >> Additionally, we also should alter RO permission of those 7 tail pages > >> to RW to avoid panic(). > > > > We can use RCU, which IMO is a better choice, as the following: > > > > get_page_unless_zero() > > { > > int rc = false; > > > > rcu_read_lock(); > > > > if (page_is_fake_head(page) || !page_ref_count(page)) { > > smp_mb(); // implied by atomic_add_unless() > > goto unlock; > > } > > > > rc = page_ref_add_unless(); > > > > unlock: > > rcu_read_unlock(); > > > > return rc; > > } > > > > And on the HVO/de-HOV sides: > > > > folio_ref_unfreeze(); > > synchronize_rcu(); > > HVO/de-HVO; > > > > I think this is a lot better than making tail page metadata RW because: > > 1. it helps debug, IMO, a lot; > > 2. I don't think HVO is the only one that needs this. > > > > David (we missed you in today's THP meeting), > > Sorry, I had a private meeting conflict :) > > > > > Please correct me if I'm wrong -- I think virtio-mem also suffers from > > the same problem when freeing offlined struct page, since I wasn't > > able to find anything that would prevent a **speculative** struct page > > walker from trying to access struct pages belonging to pages being > > concurrently offlined. > > virtio-mem does currently not yet optimize fake-offlined memory like HVO > would. So the only way we really remove "struct page" metadata is by > actually offlining+removing a complete Linux memory block, like ordinary > memory hotunplug would. > > It might be an interesting project to optimize "struct page" metadata > consumption for fake-offlined memory chunks within an online Linux > memory block. > > The biggest challenge might be interaction with memory hotplug, which > requires all "struct page" metadata to be allocated. So that would make > cases where virtio-mem hot-plugs a Linux memory block but keeps parts of > it fake-offline a bit more problematic to handle . > > In a world with memdesc this might all be nicer to handle I think :) > > > There is one possible interaction between virtio-mem and speculative > page references: all fake-offline chunks in a Linux memory block do have > on each page a refcount of 1 and PageOffline() set. When actually > offlining the Linux memory block to remove it, virtio-mem will drop that > reference during MEM_GOING_OFFLINE, such that memory offlining can > proceed (seeing refcount==0 and PageOffline()). > > In virtio_mem_fake_offline_going_offline() we have: > > if (WARN_ON(!page_ref_dec_and_test(page))) > dump_page(page, "fake-offline page referenced"); > > which would trigger on a speculative reference. > > We never saw that trigger so far because quite a long time must have > passed ever since a page might have been part of the page cache / page > tables, before virtio-mem fake-offlined it (using alloc_contig_range()) > and the Linux memory block actually gets offlined. > > But yes, RCU (e.g., on the memory offlining path) would likely be the > right approach to make sure GUP-fast and the pagecache will no longer > grab this page by accident. > > > > > If this is true, we might want to map a "zero struct page" rather than > > leave a hole in vmemmap when offlining pages. And the logic on the hot > > removal side would be similar to that of HVO. > > Once virtio-mem would do something like HVO, yes. Right now virtio-mem > only removes struct-page metadata by removing/unplugging its owned Linux > memory blocks once they are fully "logically offline". > > -- > Cheers, > > David / dhildenb >