Thanks Michal here!! On 7/19/2022 9:13 PM, Michal Hocko wrote: >>>>>> diff --git a/mm/page_ext.c b/mm/page_ext.c >>>>>> index 3dc715d..5ccd3ee 100644 >>>>>> --- a/mm/page_ext.c >>>>>> +++ b/mm/page_ext.c >>>>>> @@ -299,8 +299,9 @@ static void __free_page_ext(unsigned long pfn) >>>>>> if (!ms || !ms->page_ext) >>>>>> return; >>>>>> base = get_entry(ms->page_ext, pfn); >>>>>> - free_page_ext(base); >>>>>> ms->page_ext = NULL; >>>>>> + synchronize_rcu(); >>>>>> + free_page_ext(base); >>>>>> } >>>>> So you are imposing the RCU grace period for each page_ext! This can get >>>>> really expensive. Have you tried to measure the effect? >>> I was wrong here! This is for each memory section which is not as >>> terrible as every single page_ext. This can be still quite a lot memory >>> sections in a single memory block (e.g. on ppc memory sections are >>> ridiculously small). >>> >> On the ARM64, I see that the minimum a section size will go is 128MB. I >> think 16MB is the section size on ppc. Any inputs on how frequently >> offline/online operation is being done on this ppc arch? > I have seen several reports where 16MB sections were used on PPC LPARs > with a non trivial size. My usual answer to that is tha this is mostly a > self inflicted injury but I am told that for some reasons I cannot > udnerstand this is not easy to change. So reasonable or not this is not > all that uncommon in PPC land. > > We definitely shouldn't optimize for those setups but we shouldn't make > them suffer even more as well. Besides that it seems that a single > rcu_synchronize per offline operation should be doable. I too feel it is doable but the code might look to need to traverse the sections of a memory block twice. 1) with synchronize_rcu() calling for each memory section of a memblock: --------------------------------------------------------------------- for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) __free_page_ext(pfn): ms->page_ext = NULL synchronize_rcu();// Called on every section. free_page_ext();//free the page_ext. 2) With a single synchronize_rcu() for each offlined block: ------------------------------------------------------- for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) { __free_page_ext(pfn): ms->page_ext = NULL; } synchronize_rcu(); // call synchronize_rcu for just once for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) free_page_ext(); // Free the page_ext. Any better code you have in mind here please? But since there are few sections the overhead of traversing them will be definitely less compared to synchronize_rcu() per memsection. But I really doubt if there will be a real impact making sync_rcu per section because,(as david also mentioned and you also corrected it I think), the concern here is for ppc where the min memblock size is 256M with section size of 16M and there is a single offline operation on that block but can end up in calling 16 sync_rcu() calls. Should we really optimize this case here? If yes, I can go with the approach 2) mentioned above. Sorry if I am really undermining the problem here. Thanks, Charan