Thanks David for the inputs!! On 7/27/2022 10:59 PM, David Hildenbrand wrote: >> Fix those paths where offline races with page_ext access by maintaining >> synchronization with rcu lock and is achieved in 3 steps: >> 1) Invalidate all the page_ext's of the sections of a memory block by >> storing a flag in the LSB of mem_section->page_ext. >> >> 2) Wait till all the existing readers to finish working with the >> ->page_ext's with synchronize_rcu(). Any parallel process that starts >> after this call will not get page_ext, through lookup_page_ext(), for >> the block parallel offline operation is being performed. >> >> 3) Now safely free all sections ->page_ext's of the block on which >> offline operation is being performed. >> >> Thanks to David Hildenbrand for his views/suggestions on the initial >> discussion[1] and Pavan kondeti for various inputs on this patch. >> >> FAQ's: >> Q) Should page_ext_[get|put]() needs to be used for every page_ext >> access? >> A) NO, the synchronization is really not needed in all the paths of >> accessing page_ext. One case is where extra refcount is taken on a >> page for which memory block, this pages falls into, offline operation is >> being performed. This extra refcount makes the offline operation not to >> succeed hence the freeing of page_ext. Another case is where the page >> is already being freed and we do reset its page_owner. >> >> Some examples where the rcu_lock is not taken while accessing the >> page_ext are: >> 1) In migration (where we also migrate the page_owner information), we >> take the extra refcount on the source and destination pages and then >> start the migration. This extra refcount makes the test_pages_isolated() >> function to fail thus retry the offline operation. >> >> 2) In free_pages_prepare(), we do reset the page_owner(through page_ext) >> which again doesn't need the protection to access because the page is >> already freeing (through only one path). >> >> So, users need not to use page_ext_[get|put]() when they are sure that >> extra refcount is taken on a page preventing the offline operation. >> >> Q) Why can't the page_ext is freed in the hot_remove path, where memmap >> is also freed ? >> >> A) As per David's answers, there are many reasons and a few are: >> 1) Discussions had happened in the past to eventually also use rcu >> protection for handling pfn_to_online_page(). So doing it cleanly here >> is certainly an improvement. >> >> 2) It's not good having to scatter section online checks all over the >> place in page ext code. Once there is a difference between active vs. >> stale page ext data things get a bit messy and error prone. This is >> already ugly enough in our generic memmap handling code. >> >> 3) Having on-demand allocations, such as KASAN or page ext from the >> memory online notifier is at least currently cleaner, because we don't >> have to handle each and every subsystem that hooks into that during the >> core memory hotadd/remove phase, which primarily only setups the >> vmemmap, direct map and memory block devices. >> >> [1] https://lore.kernel.org/linux-mm/59edde13-4167-8550-86f0-11fc67882107@xxxxxxxxxxx/ >> > I guess if we care about the synchronize_rcu() we could go crazy with > temporary allocations for data-to-free + call_rcu(). IMO, single synchronize_rcu() call overhead shouldn't be cared especially if the memory offline operation it self is expected to complete in seconds. On the Snapdragon system, I can see the lowest it can complete in 3-4secs for a complete memory block of size 512M. And agree that this time depends on lot of other factors too but wanted to raise a point that it is really not a path where tiny optimizations should be strictly considered. __Please help in correcting me If I am really downplaying the scenario here__. But then I moved to single synchronize_rcu() just to avoid any visible effects that can cause by multiple synchronize_rcu() for a single memory block with lot of sections. Having said that, I am open to go for call_rcu() and infact it will be a much simple change where I can do the freeing of page_ext in the __free_page_ext() itself which is called for every section there by avoid the extra tracking flag PAGE_EXT_INVALID. ........... WRITE_ONCE(ms->page_ext, NULL); call_rcu(rcu_head, fun); // Free in fun() ............. Or your opinion is to use call_rcu () only once in place of synchronize_rcu() after invalidating all the page_ext's of memory block? Thanks, Charan