On Tue, 22 Oct 2019, Waiman Long wrote: > >>> and used nr_free to compute the missing count. Since MIGRATE_MOVABLE > >>> is usually the largest one on large memory systems, this is the one > >>> to be skipped. Since the printing order is migration-type => order, we > >>> will have to store the counts in an internal 2D array before printing > >>> them out. > >>> > >>> Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the > >>> zone lock for too long blocking out other zone lock waiters from being > >>> run. This can be problematic for systems with large amount of memory. > >>> So a check is added to temporarily release the lock and reschedule if > >>> more than 64k of list entries have been iterated for each order. With > >>> a MAX_ORDER of 11, the worst case will be iterating about 700k of list > >>> entries before releasing the lock. > >> But you are still iterating through the whole free_list at once so if it > >> gets really large then this is still possible. I think it would be > >> preferable to use per migratetype nr_free if it doesn't cause any > >> regressions. > >> > > Yes, it is still theoretically possible. I will take a further look at > > having per-migrate type nr_free. BTW, there is one more place where the > > free lists are being iterated with zone lock held - mark_free_pages(). > > Looking deeper into the code, the exact migration type is not stored in > the page itself. An initial movable page can be stolen to be put into > another migration type. So in a delete or move from free_area, we don't > know exactly what migration type the page is coming from. IOW, it is > hard to get accurate counts of the number of entries in each lists. > I think the suggestion is to maintain a nr_free count of the free_list for each order for each migratetype so anytime a page is added or deleted from the list, the nr_free is adjusted. Then the free_area's nr_free becomes the sum of its migratetype's nr_free at that order. That's possible to do if you track the migratetype per page, as you said, or like pcp pages track it as part of page->index. It's a trade-off on whether you want to impact the performance of maintaining these new nr_frees anytime you manipulate the freelists. I think Vlastimil and I discussed per order per migratetype nr_frees in the past and it could be a worthwhile improvement for other reasons, specifically it leads to heuristics that can be used to determine how fragmentated a certain migratetype is for a zone, i.e. a very quick way to determine what ratio of pages over all MIGRATE_UNMOVABLE pageblocks are free. Or maybe there are other reasons why these nr_frees can't be maintained anymore? (I had a patch to do it on 4.3.) You may also find systems where MIGRATE_MOVABLE is not actually the longest free_list compared to other migratetypes on a severely fragmented system, so special casing MIGRATE_MOVABLE might not be the best way forward.