On Fri, Apr 12, 2024 at 12:48 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 10.04.24 02:52, Yosry Ahmed wrote: > > [..] > >>> Do we need a separate notifier chain for totalram_pages() updates? > >> > >> Good question. I actually might have the requirement to notify some arch > >> code (s390x) from virtio-mem when fake adding/removing memory, and > >> already wondered how to best wire that up. > >> > >> Maybe we can squeeze that into the existing notifier chain, but needs a > >> bit of thought. > > > > Sorry for the late reply, I had to think about this a bit. > > > Do you mean by adding new actions (e.g. MEM_FAKE_ONLINE, > > MEM_FAKE_OFFLINE), or by reusing the existing actions (MEM_ONLINE, > > MEM_OFFLINE, etc). > > At least for virtio-mem, I think we could have a MEM_ONLINE/MEM_OFFLINE > that prepare the whole range belonging to the Linux memory block > (/sys/devices/system/memory/memory...) to go online, and then have > something like MEM_SOFT_ONLINE/MEM_SOFT_OFFLINE or > ENABLE_PAGES/DISABLE_PAGES ... notifications when parts become usable > (!PageOffline, handed to the buddy) or unusable (PageOffline, removed > from the buddy). > > There are some details to be figured out, but it could work. > > And as virtio-mem currently operates in pageblock granularity (e.g., 2 > MiB), but frequently handles multiple contiguous pageblocks within a > Linux memory block, it's not that bad. > > > But the issue I see with ballooning is that we operate here often on > page granularity. While we could optimize some cases, we might get quite > some overhead from all the notifications. Alternatively, we could send a > list of pages, but it won't win a beauty contest. > > I think the main issue is that, for my purpose (virtio-mem on s390x), I > need to notify about the exact memory ranges (so I can reinitialize > stuff in s390x code when memory gets effectively re-enabled). For other > cases (total pages changing), we don't need the memory ranges, but only > the "summary" -- or a notification afterwards that the total pages were > just changed quite a bit. Thanks for shedding some light on this. Although I am not familiar with ballooning, sending notifications on page granularity updates sounds terrible. It seems like this is not as straightforward as I had anticipated. I was going to take a stab at this, but given that the motivation is a minor optimization on the zswap side, I will probably just give up :) For now, I will drop this optimization from the series for now, and I can revisit it if/when notifications for totalram_pages() are implemented at some point. Please CC me if you do so for the s390x use case :) > > > > > > New actions mean minimal impact to existing notifiers, but it may make > > more sense to reuse MEM_ONLINE and MEM_OFFLINE to have generic actions > > that mean "memory increased" and "memory decreased". > > Likely, we should keep their semantics unchanged. Things like KASAN want > to allocate metadata memory for the whole range, not on some smallish > pieces. It really means "This Linux memory block goes online/offline, > please prepare for that.". And again, memory ballooning with small pages > is a bit problematic. > > > > > I suppose we can add new actions and then separately (and probably > > incrementally) audit existing notifiers to check if they want to > > handle the new actions as well. > > > > Another consideration is that apparently some ballooning drivers also > > register notifiers, so we need to make sure there is no possibility of > > deadlock/recursion. > > Right. > > -- > Cheers, > > David / dhildenb >