Thanks Michal!! On 8/15/2022 8:36 PM, Michal Hocko wrote: > On Tue 09-08-22 20:16:43, Charan Teja Kalla wrote: > [...] >> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h >> index fabb2e1..0e259da 100644 >> --- a/include/linux/page_ext.h >> +++ b/include/linux/page_ext.h > [...] >> @@ -87,5 +83,14 @@ static inline void page_ext_init_flatmem_late(void) >> static inline void page_ext_init_flatmem(void) >> { >> } >> + >> +static inline struct page *page_ext_get(struct page *page) > struct page_ext * > oops!! It didn't get caught as this is in !CONFIG_PAGE_EXTENSION. >> +{ >> + return NULL; >> +} >> + >> +static inline void page_ext_put(void) >> +{ >> +} >> #endif /* CONFIG_PAGE_EXTENSION */ >> #endif /* __LINUX_PAGE_EXT_H */ > [...] >> diff --git a/mm/page_ext.c b/mm/page_ext.c >> index 3dc715d..91d7bd2 100644 >> --- a/mm/page_ext.c >> +++ b/mm/page_ext.c >> @@ -9,6 +9,7 @@ >> #include <linux/page_owner.h> >> #include <linux/page_idle.h> >> #include <linux/page_table_check.h> >> +#include <linux/rcupdate.h> >> >> /* >> * struct page extension >> @@ -59,6 +60,10 @@ >> * can utilize this callback to initialize the state of it correctly. >> */ >> >> +#ifdef CONFIG_SPARSEMEM >> +#define PAGE_EXT_INVALID (0x1) >> +#endif >> + >> #if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) >> static bool need_page_idle(void) >> { >> @@ -84,6 +89,7 @@ static struct page_ext_operations *page_ext_ops[] __initdata = { >> unsigned long page_ext_size = sizeof(struct page_ext); >> >> static unsigned long total_usage; >> +static struct page_ext *lookup_page_ext(const struct page *page); >> >> static bool __init invoke_need_callbacks(void) >> { >> @@ -125,6 +131,37 @@ static inline struct page_ext *get_entry(void *base, unsigned long index) >> return base + page_ext_size * index; >> } >> >> +/* >> + * This function gives proper page_ext of a memory section >> + * during race with the offline operation on a memory block >> + * this section falls into. Not using this function to get >> + * page_ext of a page, in code paths where extra refcount >> + * is not taken on that page eg: pfn walking, can lead to >> + * use-after-free access of page_ext. > I do not think this is really useful comment, it goes into way too much > detail about memory hotplug yet not enough to actually understand the > interaction because there are no references to the actual > synchronization scheme. I would go with something like: > > /* > * Get a page_ext associated with the given page. Returns NULL if > * no such page_ext exists otherwise ensures that the page_ext will > * stay alive until page_ext_put is called. > * This implies a non-sleeping context. > */ Will update as per the Matthew input @ https://lore.kernel.org/all/YvplthTjM8Ez5DIq@xxxxxxxxxxxxxxxxxxxx/ >> + */ >> +struct page_ext *page_ext_get(struct page *page) >> +{ >> + struct page_ext *page_ext; >> + >> + rcu_read_lock(); >> + page_ext = lookup_page_ext(page); >> + if (!page_ext) { >> + rcu_read_unlock(); >> + return NULL; >> + } >> + >> + return page_ext; >> +} >> + >> +/* >> + * Must be called after work is done with the page_ext received >> + * with page_ext_get(). >> + */ >> + >> +void page_ext_put(void) >> +{ >> + rcu_read_unlock(); >> +} > Thinking about this some more I am not sure this is a good interface. It > doesn't have any reference to the actual object this is called for. This > is nicely visible in __folio_copy_owner which just calles page_ext_put() > twice because there are 2 page_exts and I can already see how somebody > might get confused this is just an error and send a patch to drop one of > them. > > I do understand why you went this way because having a parameter which > is not used will likely lead to the same situation. On the other hand it > could be annotated to not raise warnings. One potential way to > workaround that would be > > void page_ext_put(struct page_ext *page_ext) > { > if (unlikely(!page_ext)) > return; > > rcu_read_unlock(); > } > > which would help to make the api slightly more robust in case somebody > does page_ext_put in a branch where page_ext_get returns NULL. > Looks better. Will change this accordingly. > No strong opinion on that though. WDYI? > >> #ifndef CONFIG_SPARSEMEM >> >> > [...] >> @@ -183,19 +184,26 @@ static inline void __set_page_owner_handle(struct page_ext *page_ext, >> noinline void __set_page_owner(struct page *page, unsigned short order, >> gfp_t gfp_mask) >> { >> - struct page_ext *page_ext = lookup_page_ext(page); >> + struct page_ext *page_ext = page_ext_get(page); >> depot_stack_handle_t handle; >> >> if (unlikely(!page_ext)) >> return; > Either add a comment like this > /* save_stack can sleep in general so we have to page_ext_put */ Vlastimil suggested to go for save stack first since !page_ext is mostly unlikely. Snip from his comments: Why not simply do the save_stack() first and then page_ext_get() just once? It should be really rare that it's NULL, so I don't think we save much by avoiding an unnecessary save_stack(), while the overhead of doing two get/put instead of one will affect every call. https://lore.kernel.org/all/f5fd4942-b03e-1d1c-213b-9cd5283ced91@xxxxxxx/ >> + page_ext_put(); >> >> handle = save_stack(gfp_mask); > or just drop the initial page_ext_get altogether. This function is > called only when page_ext is supposed to be initialized and !page_ext > case above should be very unlikely. Or is there any reason to keep this? > >> + >> + /* Ensure page_ext is valid after page_ext_put() above */ >> + page_ext = page_ext_get(page); >> + if (unlikely(!page_ext)) >> + return; >> __set_page_owner_handle(page_ext, handle, order, gfp_mask); >> + page_ext_put(); >> } >> > [...] >> @@ -558,13 +590,17 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos) >> */ >> handle = READ_ONCE(page_owner->handle); >> if (!handle) >> - continue; >> + goto loop; >> >> /* Record the next PFN to read in the file offset */ >> *ppos = (pfn - min_low_pfn) + 1; >> >> + memcpy(&page_owner_tmp, page_owner, sizeof(struct page_owner)); >> + page_ext_put(); > why not > page_owner_tmp = *page_owner; Done!! > >> return print_page_owner(buf, count, pfn, page, >> - page_owner, handle); >> + &page_owner_tmp, handle); >> +loop: >> + page_ext_put(); >> } >> >> return 0;