On Mon, Apr 15, 2024 at 8:10 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 13.04.24 03:05, Yosry Ahmed wrote: > > On Fri, Apr 12, 2024 at 12:48 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >> On 10.04.24 02:52, Yosry Ahmed wrote: > >>> [..] > >>>>> Do we need a separate notifier chain for totalram_pages() updates? > >>>> > >>>> Good question. I actually might have the requirement to notify some arch > >>>> code (s390x) from virtio-mem when fake adding/removing memory, and > >>>> already wondered how to best wire that up. > >>>> > >>>> Maybe we can squeeze that into the existing notifier chain, but needs a > >>>> bit of thought. > >>> > >> > >> Sorry for the late reply, I had to think about this a bit. > >> > >>> Do you mean by adding new actions (e.g. MEM_FAKE_ONLINE, > >>> MEM_FAKE_OFFLINE), or by reusing the existing actions (MEM_ONLINE, > >>> MEM_OFFLINE, etc). > >> > >> At least for virtio-mem, I think we could have a MEM_ONLINE/MEM_OFFLINE > >> that prepare the whole range belonging to the Linux memory block > >> (/sys/devices/system/memory/memory...) to go online, and then have > >> something like MEM_SOFT_ONLINE/MEM_SOFT_OFFLINE or > >> ENABLE_PAGES/DISABLE_PAGES ... notifications when parts become usable > >> (!PageOffline, handed to the buddy) or unusable (PageOffline, removed > >> from the buddy). > >> > >> There are some details to be figured out, but it could work. > >> > >> And as virtio-mem currently operates in pageblock granularity (e.g., 2 > >> MiB), but frequently handles multiple contiguous pageblocks within a > >> Linux memory block, it's not that bad. > >> > >> > >> But the issue I see with ballooning is that we operate here often on > >> page granularity. While we could optimize some cases, we might get quite > >> some overhead from all the notifications. Alternatively, we could send a > >> list of pages, but it won't win a beauty contest. > >> > >> I think the main issue is that, for my purpose (virtio-mem on s390x), I > >> need to notify about the exact memory ranges (so I can reinitialize > >> stuff in s390x code when memory gets effectively re-enabled). For other > >> cases (total pages changing), we don't need the memory ranges, but only > >> the "summary" -- or a notification afterwards that the total pages were > >> just changed quite a bit. > > > > > > Thanks for shedding some light on this. Although I am not familiar > > with ballooning, sending notifications on page granularity updates > > sounds terrible. It seems like this is not as straightforward as I had > > anticipated. > > > > I was going to take a stab at this, but given that the motivation is a > > minor optimization on the zswap side, I will probably just give up :) > > Oh no, so I have to do the work! ;) > > > > > For now, I will drop this optimization from the series for now, and I > > can revisit it if/when notifications for totalram_pages() are > > implemented at some point. Please CC me if you do so for the s390x use > > case :) > > I primarily care about virtio-mem resizing VM memory and adjusting > totalram_pages(), memory ballooning for that is rather a hack for that > use case ... so we're in agreement :) > > Likely we'd want two notification mechanisms, but no matter how I look > at it, it's all a bit ugly. I am assuming you mean one with exact memory ranges for your s390x use case, and one high-level mechanism for totalram_pages() updates -- or did I miss the point? I am interested to see how page granularity updates would be handled in this case. Perhaps they are only relevant for the high-level mechanism? In that case, I suppose we can batch updates and notify once when a threshold is crossed or something. > > I'll look into the virtio-mem case soonish and will let you know once I > have something running. Thanks!