On Mon, 23 Nov 2020 19:05:00 +0800 Lin Feng <linf@xxxxxxxxxx> wrote: > In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set, > we have following callchain: > > start_kernel > ... > mm_init > mem_init > memblock_free_all > reset_all_zones_managed_pages > free_low_memory_core_early > ... > buffer_init > nr_free_buffer_pages > zone->managed_pages > ... > rest_init > kernel_init > kernel_init_freeable > page_alloc_init_late > kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid); > wait_for_completion(&pgdat_init_all_done_comp); > ... > files_maxfiles_init > > It's clear that buffer_init depends on zone->managed_pages, but it's reset > in reset_all_zones_managed_pages after that pages are readded into > zone->managed_pages, but when buffer_init runs this process is half done > and most of them will finally be added till deferred_init_memmap done. > In large memory couting of nr_free_buffer_pages drifts too much, also > drifting from kernels to kernels on same hardware. > > Fix is simple, it delays buffer_init run till deferred_init_memmap all done. > > But as corrected by this patch, max_buffer_heads becomes very large, > the value is roughly as many as 4 times of totalram_pages, formula: > max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct buffer_head)); > > Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads > turns out to be roughly 67,108,864. > In common cases, should a buffer_head be mapped to one page/block(4KB)? > So max_buffer_heads never exceeds totalram_pages. > IMO it's likely to make buffer_heads_over_limit bool value alwasy false, > then make codes 'if (buffer_heads_over_limit)' test in vmscan unnecessary. > Correct me if it's not true. I agree - seems that on such a system we'll allow enough buffer_heads to manage about 250GB worth of pagecache, for a 4kb filesystem blocksize. Perhaps this code is all a remnant of highmem systems, where ZONE_NORMAL is considerably smaller than ZONE_HIGHMEM, and we don't want to be consuming all of ZONE_NORMAL for highmem-attached buffer_heads. I'm not sure that it's all very harmful - we don't *need* to be trimming away at the buffer_heads on a 64GB 4-bit system so the code is really only functional on highmem machines. And as far as I know, it works OK on such machines.