Re: [PATCH] [RFC] init/main: fix broken buffer_init when DEFERRED_STRUCT_PAGE_INIT set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 23 Nov 2020 19:05:00 +0800 Lin Feng <linf@xxxxxxxxxx> wrote:

> In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set,
> we have following callchain:
> 
> start_kernel
> ...
>   mm_init
>     mem_init
>      memblock_free_all
>        reset_all_zones_managed_pages
>        free_low_memory_core_early
> ...
>   buffer_init
>     nr_free_buffer_pages
>       zone->managed_pages
> ...
>   rest_init
>     kernel_init
>       kernel_init_freeable
>         page_alloc_init_late
>           kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
>           wait_for_completion(&pgdat_init_all_done_comp);
>           ...
>           files_maxfiles_init
> 
> It's clear that buffer_init depends on zone->managed_pages, but it's reset
> in reset_all_zones_managed_pages after that pages are readded into
>  zone->managed_pages, but when buffer_init runs this process is half done
>  and most of them will finally be added till deferred_init_memmap done.
> In large memory couting of nr_free_buffer_pages drifts too much, also
> drifting from kernels to kernels on same hardware.
> 
> Fix is simple, it delays buffer_init run till deferred_init_memmap all done.
> 
> But as corrected by this patch, max_buffer_heads becomes very large,
> the value is roughly as many as 4 times of totalram_pages, formula:
> max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct buffer_head));
> 
> Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads
> turns out to be roughly 67,108,864.
> In common cases, should a buffer_head be mapped to one page/block(4KB)?
> So max_buffer_heads never exceeds totalram_pages.
> IMO it's likely to make buffer_heads_over_limit bool value alwasy false,
> then make codes 'if (buffer_heads_over_limit)' test in vmscan unnecessary.
> Correct me if it's not true.

I agree - seems that on such a system we'll allow enough buffer_heads
to manage about 250GB worth of pagecache, for a 4kb filesystem
blocksize.

Perhaps this code is all a remnant of highmem systems, where
ZONE_NORMAL is considerably smaller than ZONE_HIGHMEM, and we don't
want to be consuming all of ZONE_NORMAL for highmem-attached
buffer_heads.

I'm not sure that it's all very harmful - we don't *need* to be
trimming away at the buffer_heads on a 64GB 4-bit system so the code is
really only functional on highmem machines.  And as far as I know, it
works OK on such machines.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux