Re: [PATCH v4 2/2] mm: Init page count in reserve_bootmem_region when MEMINIT_EARLY

Yajun Deng <yajun.deng@xxxxxxxxx> · Tue, 17 Oct 2023 17:58:24 +0800

On 2023/10/16 18:17, Yajun Deng wrote:

On 2023/10/16 16:36, David Hildenbrand wrote:
On 16.10.23 10:32, Yajun Deng wrote:

On 2023/10/16 16:16, David Hildenbrand wrote:
On 16.10.23 10:10, Yajun Deng wrote:

On 2023/10/16 14:33, Mike Rapoport wrote:
On Fri, Oct 13, 2023 at 05:29:19PM +0800, Yajun Deng wrote:
On 2023/10/13 16:48, Mike Rapoport wrote:
On Thu, Oct 12, 2023 at 05:53:22PM +0800, Yajun Deng wrote:
On 2023/10/12 17:23, David Hildenbrand wrote:
On 10.10.23 04:31, Yajun Deng wrote:
On 2023/10/8 16:57, Yajun Deng wrote:
That looks wrong. if the page count would by pure luck be 0
already for hotplugged memory, you wouldn't clear the 
reserved
flag.

These changes make me a bit nervous.
Is 'if (page_count(page) || PageReserved(page))' be safer? Or
do I
need to do something else?

How about the following if statement? But it needs to add more
patch
like v1 ([PATCH 2/4] mm: Introduce MEMINIT_LATE context).

It'll be safer, but more complex. Please comment...

        if (context != MEMINIT_EARLY || (page_count(page) ||
PageReserved(page)) {

Ideally we could make initialization only depend on the context,
and not
check for count or the reserved flag.

This link is v1,
https://lore.kernel.org/all/20230922070923.355656-1-yajun.deng@xxxxxxxxx/ 

If we could make initialization only depend on the context, I'll
modify it
based on v1.
Although ~20% improvement looks impressive, this is only
optimization of a
fraction of the boot time, and realistically, how much 56 msec
saves from
the total boot time when you boot a machine with 190G of RAM?
There are a lot of factors that can affect the total boot time. 56
msec
saves may be insignificant.

But if we look at the boot log, we'll see there's a significant
time jump.

before:

[    0.250334] ACPI: PM-Timer IO Port: 0x508
[    0.618994] Memory: 173413056K/199884452K available (18440K
kernel code,

after:

[    0.260229] software IO TLB: area num 32.
[    0.563497] Memory: 173413056K/199884452K available (18440K
kernel code,
Memory:
Memory initialization is time consuming in the boot log.
You just confirmed that 56 msec is insignificant and then you send
again
the improvement of ~60 msec in memory initialization.

What does this improvement gain in percentage of total boot time?

before:

[   10.692708] Run /init as init process

after:

[   10.666290] Run /init as init process

About 0.25%. The total boot time is variable, depending on how many
drivers need to be initialized.

I still think the improvement does not justify the churn, added
complexity
and special casing of different code paths of initialization of
struct pages.

Because there is a loop, if the order is MAX_ORDER, the loop will
run 1024
times. The following 'if' would be safer:

'if (context != MEMINIT_EARLY || (page_count(page) || >>
PageReserved(page))
{'
No, it will not.

As the matter of fact any condition here won't be 'safer' because it
makes
the code more complex and less maintainable.
Any future change in __free_pages_core() or one of it's callers will
have
to reason what will happen with that condition after the change.

To avoid introducing MEMINIT_LATE context and make code simpler. This
might be a better option.

if (page_count(page) || PageReserved(page))

I'll have to side with Mike here; this change might not be worth it.

Okay, I got it. Thanks!

IMHO instead of adding more checks to that code we should try to 
unify that handling such that we can just remove it. As expressed, at 
least from the memory hotplug perspective there are still reasons why 
we need that; I can provide some guidance on how to eventually 
achieve that, but it might end up in a bit of work ...

Yes, we can't remove it right now. If we want to do that, we have to 
clean up rely on page count and PageReserved first.

How about making __free_pages_core separate, like:

void __init __free_pages_core_early(struct page *page, unsigned int order)
{
        unsigned int nr_pages = 1 << order;

        atomic_long_add(nr_pages, &page_zone(page)->managed_pages);

        if (page_contains_unaccepted(page, order)) {
                if (order == MAX_ORDER && __free_unaccepted(page))
                        return;

                accept_page(page, order);
        }

        /*
         * Bypass PCP and place fresh pages right to the tail, primarily
         * relevant for memory onlining.
         */
        __free_pages_ok(page, order, FPI_TO_TAIL);
}

void __free_pages_core(struct page *page, unsigned int order)
{
        unsigned int nr_pages = 1 << order;
        struct page *p = page;
        unsigned int loop;

        /*
         * When initializing the memmap, __init_single_page() sets the 
refcount
         * of all pages to 1 ("allocated"/"not free"). We have to set the
         * refcount of all involved pages to 0.
         */
        prefetchw(p);
        for (loop = 0; loop < (nr_pages - 1); loop++, p++) {
                prefetchw(p + 1);
                __ClearPageReserved(p);
                set_page_count(p, 0);
        }
        __ClearPageReserved(p);
        set_page_count(p, 0);

        __free_pages_core_early(page, order);
}

We only change the caller we need to __free_pages_core_early, it doesn't 
affect other callers.

Anyhow, thanks for bringing up that topic; it reminded me that I 
still have pending cleanups to not rely on PageReserved on the memory 
hotplug path.