On Tuesday 06 October 2015 11:06 AM, Vineet Gupta wrote: > On Tuesday 06 October 2015 03:40 AM, Andrew Morton wrote: >> On Sat, 3 Oct 2015 18:25:13 +0530 Vineet Gupta <Vineet.Gupta1@xxxxxxxxxxxx> wrote: >> >>> Hi, >>> >>> I noticed increased boot time when enabling highmem for ARC. Turns out that >>> freeing highmem pages into buddy allocator is done page at a time, while it is >>> batched for low mem pages. Below is call flow. >>> >>> I'm thinking of writing free_highmem_pages() which takes start and end pfn and >>> want to solicit some ideas whether to write it from scratch or preferably call >>> existing __free_pages_memory() to reuse the logic to convert a pfn range into >>> {pfn, order} tuples. >>> >>> For latter however there are semantical differences as you can see below which I'm >>> not sure of: >>> -highmem page->count is set to 1, while 0 for low mem >> That would be weird. >> >> Look more closely at __free_pages_boot_core() - it uses >> set_page_refcounted() to set the page's refcount to 1. Those >> set_page_count() calls look superfluous to me. > If you closer still, set_page_refcounted() is called outside the loop for the > first page only. For all pages, loop iterator sets them to 1. Turns out there's > more fun here.... > > I ran this under a debugger and much earlier in boot process, there's existing > setting of page count to 1 for *all* pages of *all* zones (include highmem pages). > See call flow below. > > free_area_init_node > free_area_init_core > loops thru all zones > memmap_init_zone > loops thru all pages of zones > __init_single_page > > This means the subsequent setting of page count to 0 (or 1 for the special first > page) is superfluous - actually buggy at best. I will send a patch to fix that. I > hope I don't break some obscure init path which doesn't hit the above init. So I took a stab at it and broke it royally. I was too naive for this to begin with. The explicit setting to 1 for high mem pages, 0 for all low mem pages except 1st page in @order which has 1 is all by design. __free_pages() called by both code paths, always decrements the refcount of struct page. In case of page batch (order !=0) it only decrements the first page's refcount. This was my find of the month - but you probably have known this for longest amount of time ! Live and learn. The current High mem page only uses order == 0, so init ref count of 1 is needed (although done from __init_single_page is sufficient - no need to do that again in free_highmem_page()). The low mem pages though typically call free_pages() with order > 0, thus the caller carefully setsup the first page in @order to refcount 1 (using set_page_refcounted()), while rest of pages are set to 0 refcount in the loop. Thus the seeming redundant setting of 0 seems to be fine IMHO - perhaps better to document it - assuming I got it right so far. >>> -atomic clearing of page reserved flag vs. non atomic >> I doubt if the atomic is needed - who else can be looking at this page >> at this time? > I'll send another one to separately fix that as well. Seems like boot mem setup is > a relatively neglect part of kernel. > > -Vineet > > -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html