On Tue, Aug 27, 2019 at 03:17:39PM +0300, Kirill A. Shutemov wrote: > On Tue, Aug 27, 2019 at 02:09:23PM +0200, Michal Hocko wrote: > > On Tue 27-08-19 14:01:56, Vlastimil Babka wrote: > > > On 8/27/19 1:02 PM, Kirill A. Shutemov wrote: > > > > On Tue, Aug 27, 2019 at 08:01:39AM +0200, Michal Hocko wrote: > > > >> On Mon 26-08-19 16:15:38, Kirill A. Shutemov wrote: > > > >>> > > > >>> Unmapped completely pages will be freed with current code. Deferred split > > > >>> only applies to partly mapped THPs: at least on 4k of the THP is still > > > >>> mapped somewhere. > > > >> > > > >> Hmm, I am probably misreading the code but at least current Linus' tree > > > >> reads page_remove_rmap -> [page_remove_anon_compound_rmap ->\ deferred_split_huge_page even > > > >> for fully mapped THP. > > > > > > > > Well, you read correctly, but it was not intended. I screwed it up at some > > > > point. > > > > > > > > See the patch below. It should make it work as intened. > > > > > > > > It's not bug as such, but inefficientcy. We add page to the queue where > > > > it's not needed. > > > > > > But that adding to queue doesn't affect whether the page will be freed > > > immediately if there are no more partial mappings, right? I don't see > > > deferred_split_huge_page() pinning the page. > > > So your patch wouldn't make THPs freed immediately in cases where they > > > haven't been freed before immediately, it just fixes a minor > > > inefficiency with queue manipulation? > > > > Ohh, right. I can see that in free_transhuge_page now. So fully mapped > > THPs really do not matter and what I have considered an odd case is > > really happening more often. > > > > That being said this will not help at all for what Yang Shi is seeing > > and we need a more proactive deferred splitting as I've mentioned > > earlier. > > It was not intended to fix the issue. It's fix for current logic. I'm > playing with the work approach now. Below is what I've come up with. It appears to be functional. Any comments? diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d77d717c620c..c576e9d772b7 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -758,6 +758,8 @@ typedef struct pglist_data { spinlock_t split_queue_lock; struct list_head split_queue; unsigned long split_queue_len; + unsigned int deferred_split_calls; + struct work_struct deferred_split_work; #endif /* Fields commonly accessed by the page reclaim scanner */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c index de1f15969e27..12d109bbe8ac 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2820,22 +2820,6 @@ void free_transhuge_page(struct page *page) free_compound_page(page); } -void deferred_split_huge_page(struct page *page) -{ - struct pglist_data *pgdata = NODE_DATA(page_to_nid(page)); - unsigned long flags; - - VM_BUG_ON_PAGE(!PageTransHuge(page), page); - - spin_lock_irqsave(&pgdata->split_queue_lock, flags); - if (list_empty(page_deferred_list(page))) { - count_vm_event(THP_DEFERRED_SPLIT_PAGE); - list_add_tail(page_deferred_list(page), &pgdata->split_queue); - pgdata->split_queue_len++; - } - spin_unlock_irqrestore(&pgdata->split_queue_lock, flags); -} - static unsigned long deferred_split_count(struct shrinker *shrink, struct shrink_control *sc) { @@ -2901,6 +2885,44 @@ static struct shrinker deferred_split_shrinker = { .flags = SHRINKER_NUMA_AWARE, }; +void flush_deferred_split_queue(struct work_struct *work) +{ + struct pglist_data *pgdata; + struct shrink_control sc; + + pgdata = container_of(work, struct pglist_data, deferred_split_work); + sc.nid = pgdata->node_id; + sc.nr_to_scan = 0; /* Unlimited */ + + deferred_split_scan(NULL, &sc); +} + +#define NR_CALLS_TO_SPLIT 32 +#define NR_PAGES_ON_QUEUE_TO_SPLIT 16 + +void deferred_split_huge_page(struct page *page) +{ + struct pglist_data *pgdata = NODE_DATA(page_to_nid(page)); + unsigned long flags; + + VM_BUG_ON_PAGE(!PageTransHuge(page), page); + + spin_lock_irqsave(&pgdata->split_queue_lock, flags); + if (list_empty(page_deferred_list(page))) { + count_vm_event(THP_DEFERRED_SPLIT_PAGE); + list_add_tail(page_deferred_list(page), &pgdata->split_queue); + pgdata->split_queue_len++; + pgdata->deferred_split_calls++; + } + + if (pgdata->deferred_split_calls > NR_CALLS_TO_SPLIT && + pgdata->split_queue_len > NR_PAGES_ON_QUEUE_TO_SPLIT) { + pgdata->deferred_split_calls = 0; + schedule_work(&pgdata->deferred_split_work); + } + spin_unlock_irqrestore(&pgdata->split_queue_lock, flags); +} + #ifdef CONFIG_DEBUG_FS static int split_huge_pages_set(void *data, u64 val) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9c9194959271..86af66d463e9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6636,11 +6636,14 @@ static unsigned long __init calc_memmap_size(unsigned long spanned_pages, } #ifdef CONFIG_TRANSPARENT_HUGEPAGE +void flush_deferred_split_queue(struct work_struct *work); static void pgdat_init_split_queue(struct pglist_data *pgdat) { spin_lock_init(&pgdat->split_queue_lock); INIT_LIST_HEAD(&pgdat->split_queue); pgdat->split_queue_len = 0; + pgdat->deferred_split_calls = 0; + INIT_WORK(&pgdat->deferred_split_work, flush_deferred_split_queue); } #else static void pgdat_init_split_queue(struct pglist_data *pgdat) {} -- Kirill A. Shutemov