Hello Kirill, On Tue, Oct 06, 2015 at 06:24:01PM +0300, Kirill A. Shutemov wrote: > +static unsigned long deferred_split_scan(struct shrinker *shrink, > + struct shrink_control *sc) > +{ > + unsigned long flags; > + LIST_HEAD(list), *pos, *next; > + struct page *page; > + int split = 0; > + > + spin_lock_irqsave(&split_queue_lock, flags); > + list_splice_init(&split_queue, &list); > + > + /* Take pin on all head pages to avoid freeing them under us */ > + list_for_each_safe(pos, next, &list) { > + page = list_entry((void *)pos, struct page, mapping); > + page = compound_head(page); > + /* race with put_compound_page() */ > + if (!get_page_unless_zero(page)) { > + list_del_init(page_deferred_list(page)); > + split_queue_len--; > + } > + } > + spin_unlock_irqrestore(&split_queue_lock, flags); While rebasing I noticed this loop looks a bit too heavy. There's no lockbreak and no cap on the list size, and million of THP pages could have been partially unmapped but not be entirely freed yet, and sit there for a while (there are other scenarios but this is the one that could more realistically happen with certain allocators). Then as result of random memory pressure we'd be calling millions of get_page_unless_zero across multiple NUMA nodes thrashing cachelines at every list entry, with irq disabled too for the whole period. I haven't verified it, but I guess that in some large NUMA (i.e. 4TiB) system that could take down a CPU for a second or more with irq disabled. I think it needs to isolate a certain number of pages, not splice (userland programs can invoke the shrinker through direct reclaim too and they can't stuck there for too long) and perhaps use sc->nr_to_scan to achieve that. The split_queue can also be moved from global to the "struct pglist_data" and then you can do NODE_DATA(sc->nid)->split_queue, same for the spinlock. That will make it more scalable for the lock and more efficient in freeing memory so we don't split THP from nodes reclaim isn't currently interested about (reclaim will later try again on the zones in the other nodes by itself if needed). Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>