On Thu 03-08-17 20:11:58, Wei Wang wrote: > On 08/03/2017 07:28 PM, Michal Hocko wrote: > >On Thu 03-08-17 19:27:19, Wei Wang wrote: > >>On 08/03/2017 06:44 PM, Michal Hocko wrote: > >>>On Thu 03-08-17 18:42:15, Wei Wang wrote: > >>>>On 08/03/2017 05:11 PM, Michal Hocko wrote: > >>>>>On Thu 03-08-17 14:38:18, Wei Wang wrote: > >>>[...] > >>>>>>+static int report_free_page_block(struct zone *zone, unsigned int order, > >>>>>>+ unsigned int migratetype, struct page **page) > >>>>>This is just too ugly and wrong actually. Never provide struct page > >>>>>pointers outside of the zone->lock. What I've had in mind was to simply > >>>>>walk free lists of the suitable order and call the callback for each one. > >>>>>Something as simple as > >>>>> > >>>>> for (i = 0; i < MAX_NR_ZONES; i++) { > >>>>> struct zone *zone = &pgdat->node_zones[i]; > >>>>> > >>>>> if (!populated_zone(zone)) > >>>>> continue; > >>>>> spin_lock_irqsave(&zone->lock, flags); > >>>>> for (order = min_order; order < MAX_ORDER; ++order) { > >>>>> struct free_area *free_area = &zone->free_area[order]; > >>>>> enum migratetype mt; > >>>>> struct page *page; > >>>>> > >>>>> if (!free_area->nr_pages) > >>>>> continue; > >>>>> > >>>>> for_each_migratetype_order(order, mt) { > >>>>> list_for_each_entry(page, > >>>>> &free_area->free_list[mt], lru) { > >>>>> > >>>>> pfn = page_to_pfn(page); > >>>>> visit(opaque2, prn, 1<<order); > >>>>> } > >>>>> } > >>>>> } > >>>>> > >>>>> spin_unlock_irqrestore(&zone->lock, flags); > >>>>> } > >>>>> > >>>>>[...] > >>>>I think the above would take the lock for too long time. That's why we > >>>>prefer to take one free page block each time, and taking it one by one > >>>>also doesn't make a difference, in terms of the performance that we > >>>>need. > >>>I think you should start with simple approach and impove incrementally > >>>if this turns out to be not optimal. I really detest taking struct pages > >>>outside of the lock. You never know what might happen after the lock is > >>>dropped. E.g. can you race with the memory hotremove? > >> > >>The caller won't use pages returned from the function, so I think there > >>shouldn't be an issue or race if the returned pages are used (i.e. not free > >>anymore) or simply gone due to hotremove. > >No, this is just too error prone. Consider that struct page pointer > >itself could get invalid in the meantime. Please always keep robustness > >in mind first. Optimizations are nice but it is even not clear whether > >the simple variant will cause any problems. > > > how about this: > > for_each_populated_zone(zone) { > for_each_migratetype_order_decend(min_order, order, type) { > do { > => spin_lock_irqsave(&zone->lock, flags); > ret = report_free_page_block(zone, order, type, > &page)) { > pfn = page_to_pfn(page); > nr_pages = 1 << order; > visit(opaque1, pfn, nr_pages); > } > => spin_unlock_irqrestore(&zone->lock, flags); > } while (!ret) > } > > In this way, we can still keep the lock granularity at one free page block > while having the struct page operated under the lock. How can you continue iteration of free_list after the lock has been dropped? If you want to keep the lock held for each migrate type then why not. Just push the lock inside for_each_migratetype_order loop from my example. -- Michal Hocko SUSE Labs