On Mon, Apr 15, 2024 at 12:15 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 15.04.24 20:30, Yosry Ahmed wrote: > > On Mon, Apr 15, 2024 at 8:10 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >> On 13.04.24 03:05, Yosry Ahmed wrote: > >>> On Fri, Apr 12, 2024 at 12:48 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >>>> > >>>> On 10.04.24 02:52, Yosry Ahmed wrote: > >>>>> [..] > >>>>>>> Do we need a separate notifier chain for totalram_pages() updates? > >>>>>> > >>>>>> Good question. I actually might have the requirement to notify some arch > >>>>>> code (s390x) from virtio-mem when fake adding/removing memory, and > >>>>>> already wondered how to best wire that up. > >>>>>> > >>>>>> Maybe we can squeeze that into the existing notifier chain, but needs a > >>>>>> bit of thought. > >>>>> > >>>> > >>>> Sorry for the late reply, I had to think about this a bit. > >>>> > >>>>> Do you mean by adding new actions (e.g. MEM_FAKE_ONLINE, > >>>>> MEM_FAKE_OFFLINE), or by reusing the existing actions (MEM_ONLINE, > >>>>> MEM_OFFLINE, etc). > >>>> > >>>> At least for virtio-mem, I think we could have a MEM_ONLINE/MEM_OFFLINE > >>>> that prepare the whole range belonging to the Linux memory block > >>>> (/sys/devices/system/memory/memory...) to go online, and then have > >>>> something like MEM_SOFT_ONLINE/MEM_SOFT_OFFLINE or > >>>> ENABLE_PAGES/DISABLE_PAGES ... notifications when parts become usable > >>>> (!PageOffline, handed to the buddy) or unusable (PageOffline, removed > >>>> from the buddy). > >>>> > >>>> There are some details to be figured out, but it could work. > >>>> > >>>> And as virtio-mem currently operates in pageblock granularity (e.g., 2 > >>>> MiB), but frequently handles multiple contiguous pageblocks within a > >>>> Linux memory block, it's not that bad. > >>>> > >>>> > >>>> But the issue I see with ballooning is that we operate here often on > >>>> page granularity. While we could optimize some cases, we might get quite > >>>> some overhead from all the notifications. Alternatively, we could send a > >>>> list of pages, but it won't win a beauty contest. > >>>> > >>>> I think the main issue is that, for my purpose (virtio-mem on s390x), I > >>>> need to notify about the exact memory ranges (so I can reinitialize > >>>> stuff in s390x code when memory gets effectively re-enabled). For other > >>>> cases (total pages changing), we don't need the memory ranges, but only > >>>> the "summary" -- or a notification afterwards that the total pages were > >>>> just changed quite a bit. > >>> > >>> > >>> Thanks for shedding some light on this. Although I am not familiar > >>> with ballooning, sending notifications on page granularity updates > >>> sounds terrible. It seems like this is not as straightforward as I had > >>> anticipated. > >>> > >>> I was going to take a stab at this, but given that the motivation is a > >>> minor optimization on the zswap side, I will probably just give up :) > >> > >> Oh no, so I have to do the work! ;) > >> > >>> > >>> For now, I will drop this optimization from the series for now, and I > >>> can revisit it if/when notifications for totalram_pages() are > >>> implemented at some point. Please CC me if you do so for the s390x use > >>> case :) > >> > >> I primarily care about virtio-mem resizing VM memory and adjusting > >> totalram_pages(), memory ballooning for that is rather a hack for that > >> use case ... so we're in agreement :) > >> > >> Likely we'd want two notification mechanisms, but no matter how I look > >> at it, it's all a bit ugly. > > > > I am assuming you mean one with exact memory ranges for your s390x use > > case, and one high-level mechanism for totalram_pages() updates -- or > > did I miss the point? > > No, that's it. > > > > > I am interested to see how page granularity updates would be handled > > in this case. Perhaps they are only relevant for the high-level > > mechanism? In that case, I suppose we can batch updates and notify > > once when a threshold is crossed or something. > > Yes, we'd batch updates. Thanks for clarifying, looking forward to your changes :)