Hi Minchan, I am sorry for answer late. I spent some time on ubuntu 16.04 with mmtests in an old laptop. 2017-07-17 13:39 GMT+08:00 Minchan Kim <minchan@xxxxxxxxxx>: > Hello Hui, > > On Fri, Jul 14, 2017 at 03:51:07PM +0800, Hui Zhu wrote: >> Got some -EBUSY from zs_page_migrate that will make migration >> slow (retry) or fail (zs_page_putback will schedule_work free_work, >> but it cannot ensure the success). > > I think EAGAIN(migration retrial) is better than EBUSY(bailout) because > expectation is that zsmalloc will release the empty zs_page soon so > at next retrial, it will be succeeded. I am not sure. This is the call trace of zs_page_migrate: zs_page_migrate mapping->a_ops->migratepage move_to_new_page __unmap_and_move unmap_and_move migrate_pages In unmap_and_move will remove page from migration page list and call putback_movable_page(will call mapping->a_ops->putback_page) if return value of zs_page_migrate is not -EAGAIN. The comments of this part: After called mapping->a_ops->putback_page, zsmalloc can free the page from ZS_EMPTY list. If retrun -EAGAIN, the page will be not be put back. EAGAIN page will be try again in migrate_pages without re-isolate. > About schedule_work, as you said, we don't make sure when it happens but > I believe it will happen in a migration iteration most of case. > How often do you see that case? I noticed this issue because my Kernel patch https://lkml.org/lkml/2014/5/28/113 that will remove retry in __alloc_contig_migrate_range. This retry willhandle the -EBUSY because it will re-isolate the page and re-call migrate_pages. Without it will make cma_alloc fail at once with -EBUSY. > >> >> And I didn't find anything that make zs_page_migrate cannot work with >> a ZS_EMPTY zspage. >> So make the patch to not check inuse if migrate_mode is not >> MIGRATE_ASYNC. > > At a first glance, I think it work but the question is that it a same problem > ith schedule_work of zs_page_putback. IOW, Until the work is done, compaction > cannot succeed. Do you have any number before and after? > Following is what I got with highalloc-performance in a vbox with 2 cpu 1G memory 512 zram as swap: ori afte orig after Minor Faults 50805113 50801261 Major Faults 43918 46692 Swap Ins 42087 46299 Swap Outs 89718 105495 Allocation stalls 0 0 DMA allocs 57787 69787 DMA32 allocs 47964599 47983772 Normal allocs 0 0 Movable allocs 0 0 Direct pages scanned 45493 28837 Kswapd pages scanned 1565222 1512947 Kswapd pages reclaimed 1342222 1334030 Direct pages reclaimed 45615 30174 Kswapd efficiency 85% 88% Kswapd velocity 1897.101 1708.309 Direct efficiency 100% 104% Direct velocity 55.139 32.561 Percentage direct scans 2% 1% Zone normal velocity 1952.240 1740.870 Zone dma32 velocity 0.000 0.000 Zone dma velocity 0.000 0.000 Page writes by reclaim 89764.000 106043.000 Page writes file 46 548 Page writes anon 89718 105495 Page reclaim immediate 21457 7269 Sector Reads 3259688 3144160 Sector Writes 3667252 3675528 Page rescued immediate 0 0 Slabs scanned 1042872 1035438 Direct inode steals 8042 7772 Kswapd inode steals 54295 55075 Kswapd skipped wait 0 0 THP fault alloc 175 200 THP collapse alloc 226 363 THP splits 0 0 THP fault fallback 11 1 THP collapse fail 3 1 Compaction stalls 536 647 Compaction success 322 384 Compaction failures 214 263 Page migrate success 119608 127002 Page migrate failure 2723 2309 Compaction pages isolated 250179 265318 Compaction migrate scanned 9131832 9351314 Compaction free scanned 2093272 3059014 Compaction cost 192 202 NUMA alloc hit 47124555 47086375 NUMA alloc miss 0 0 NUMA interleave hit 0 0 NUMA alloc local 47124555 47086375 NUMA base PTE updates 0 0 NUMA huge PMD updates 0 0 NUMA page range updates 0 0 NUMA hint faults 0 0 NUMA hint local faults 0 0 NUMA hint local percent 100 100 NUMA pages migrated 0 0 AutoNUMA cost 0% 0% It looks Page migrate success is increased. Thanks, Hui > Thanks. > >> >> Signed-off-by: Hui Zhu <zhuhui@xxxxxxxxxx> >> --- >> mm/zsmalloc.c | 66 +++++++++++++++++++++++++++++++++-------------------------- >> 1 file changed, 37 insertions(+), 29 deletions(-) >> >> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c >> index d41edd2..c298e5c 100644 >> --- a/mm/zsmalloc.c >> +++ b/mm/zsmalloc.c >> @@ -1982,6 +1982,7 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, >> unsigned long old_obj, new_obj; >> unsigned int obj_idx; >> int ret = -EAGAIN; >> + int inuse; >> >> VM_BUG_ON_PAGE(!PageMovable(page), page); >> VM_BUG_ON_PAGE(!PageIsolated(page), page); >> @@ -1996,21 +1997,24 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, >> offset = get_first_obj_offset(page); >> >> spin_lock(&class->lock); >> - if (!get_zspage_inuse(zspage)) { >> + inuse = get_zspage_inuse(zspage); >> + if (mode == MIGRATE_ASYNC && !inuse) { >> ret = -EBUSY; >> goto unlock_class; >> } >> >> pos = offset; >> s_addr = kmap_atomic(page); >> - while (pos < PAGE_SIZE) { >> - head = obj_to_head(page, s_addr + pos); >> - if (head & OBJ_ALLOCATED_TAG) { >> - handle = head & ~OBJ_ALLOCATED_TAG; >> - if (!trypin_tag(handle)) >> - goto unpin_objects; >> + if (inuse) { >> + while (pos < PAGE_SIZE) { >> + head = obj_to_head(page, s_addr + pos); >> + if (head & OBJ_ALLOCATED_TAG) { >> + handle = head & ~OBJ_ALLOCATED_TAG; >> + if (!trypin_tag(handle)) >> + goto unpin_objects; >> + } >> + pos += class->size; >> } >> - pos += class->size; >> } >> >> /* >> @@ -2020,20 +2024,22 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, >> memcpy(d_addr, s_addr, PAGE_SIZE); >> kunmap_atomic(d_addr); >> >> - for (addr = s_addr + offset; addr < s_addr + pos; >> - addr += class->size) { >> - head = obj_to_head(page, addr); >> - if (head & OBJ_ALLOCATED_TAG) { >> - handle = head & ~OBJ_ALLOCATED_TAG; >> - if (!testpin_tag(handle)) >> - BUG(); >> - >> - old_obj = handle_to_obj(handle); >> - obj_to_location(old_obj, &dummy, &obj_idx); >> - new_obj = (unsigned long)location_to_obj(newpage, >> - obj_idx); >> - new_obj |= BIT(HANDLE_PIN_BIT); >> - record_obj(handle, new_obj); >> + if (inuse) { >> + for (addr = s_addr + offset; addr < s_addr + pos; >> + addr += class->size) { >> + head = obj_to_head(page, addr); >> + if (head & OBJ_ALLOCATED_TAG) { >> + handle = head & ~OBJ_ALLOCATED_TAG; >> + if (!testpin_tag(handle)) >> + BUG(); >> + >> + old_obj = handle_to_obj(handle); >> + obj_to_location(old_obj, &dummy, &obj_idx); >> + new_obj = (unsigned long) >> + location_to_obj(newpage, obj_idx); >> + new_obj |= BIT(HANDLE_PIN_BIT); >> + record_obj(handle, new_obj); >> + } >> } >> } >> >> @@ -2055,14 +2061,16 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, >> >> ret = MIGRATEPAGE_SUCCESS; >> unpin_objects: >> - for (addr = s_addr + offset; addr < s_addr + pos; >> + if (inuse) { >> + for (addr = s_addr + offset; addr < s_addr + pos; >> addr += class->size) { >> - head = obj_to_head(page, addr); >> - if (head & OBJ_ALLOCATED_TAG) { >> - handle = head & ~OBJ_ALLOCATED_TAG; >> - if (!testpin_tag(handle)) >> - BUG(); >> - unpin_tag(handle); >> + head = obj_to_head(page, addr); >> + if (head & OBJ_ALLOCATED_TAG) { >> + handle = head & ~OBJ_ALLOCATED_TAG; >> + if (!testpin_tag(handle)) >> + BUG(); >> + unpin_tag(handle); >> + } >> } >> } >> kunmap_atomic(s_addr); >> -- >> 1.9.1 >> >> -- >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >> see: http://www.linux-mm.org/ . >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>