Re: [PATCH] mm: increase totalram_pages on freeing to buddy system

David Hildenbrand <david@xxxxxxxxxx> · Mon, 3 Jun 2024 22:43:51 +0200

On 03.06.24 22:01, Wei Yang wrote:
On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
On 02.06.24 02:58, Wei Yang wrote:
On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
On 01.06.24 17:32, David Hildenbrand wrote:
On 01.06.24 15:34, Wei Yang wrote:
Total memory represents pages managed by buddy system.

No, that's managed pages.

After the
introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
being managed.

I recall one reason that is done, so other subsystem know the total
memory size even before deferred init is done.

free_low_memory_core_early() returns number of pages for all free pages,
even at this moment only early initialized pages are freed to buddy
system. This means the total memory at this moment is not correct.

Let's increase it when pages are freed to buddy system.

I'm missing the "why", and the very first sentence of this patch is wrong.

Correction: your statement was correct :) That's why
adjust_managed_page_count() adjusts that as well.

__free_pages_core() only adjusts managed page count, because it assumes
totalram has already been adjusted early during boot.

The reason we have this split for now, I think, is because of subsystems that
call totalram_pages() during init.

So the "why" question remains, because this change has the potential to break
other stuff.

Thanks, I didn't notice this.

I think having your cleanup would be very nice, as I have patches in the
works that would benefit from being able to move the totalram update from
memory hotplug code to __free_pages_core().

I got the same feeling.

We'd have to make sure that no code relies on totalram being sane/fixed
during boot for the initial memory. I think right now we might have such
code.

One concern is totalram would change when hotplug is enabled. That sounds
those codes should do some re-calculation after totalram changes?

We don't have such code in place -- there were discussions regarding 
that recently.

It would be reasonable to take a look at all totalram_pages() users and 
determine if they could be affected by deferring updating it.

At least page_alloc_init_late()->deferred_init_memmap() happens before 
do_basic_setup()->do_initcalls(), which is good.

So maybe it's not a big concern and this separate totalram pages 
accounting is much rather some legacy leftover.

Further, we currently require only a single atomic RMW instruction to adjust
totalram during boot, moving it to __free_pages_core() would imply more
atomics: but usually only one per MAX_ORDER page, so I doubt this would make
a big difference.

I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and
with defer_init only low zone's memory is initialized during boot. Per my
understanding, low zone's memory is 4GB for x86. So the extra calculation is
4GB / 2MB = 2K.

Well, for all deferred-initialized memory you would now also require 
these -- or if deferred-init would be disabled. Sounds like an 
interesting measurement if that would be measurable at all.

--
Cheers,

David / dhildenb