Hi Hui, On Thu, Jul 20, 2017 at 02:39:17PM +0800, Hui Zhu wrote: > Hi Minchan, > > I am sorry for answer late. > I spent some time on ubuntu 16.04 with mmtests in an old laptop. > > 2017-07-17 13:39 GMT+08:00 Minchan Kim <minchan@xxxxxxxxxx>: > > Hello Hui, > > > > On Fri, Jul 14, 2017 at 03:51:07PM +0800, Hui Zhu wrote: > >> Got some -EBUSY from zs_page_migrate that will make migration > >> slow (retry) or fail (zs_page_putback will schedule_work free_work, > >> but it cannot ensure the success). > > > > I think EAGAIN(migration retrial) is better than EBUSY(bailout) because > > expectation is that zsmalloc will release the empty zs_page soon so > > at next retrial, it will be succeeded. > > > I am not sure. > > This is the call trace of zs_page_migrate: > zs_page_migrate > mapping->a_ops->migratepage > move_to_new_page > __unmap_and_move > unmap_and_move > migrate_pages > > In unmap_and_move will remove page from migration page list > and call putback_movable_page(will call mapping->a_ops->putback_page) if > return value of zs_page_migrate is not -EAGAIN. > The comments of this part: > After called mapping->a_ops->putback_page, zsmalloc can free the page > from ZS_EMPTY list. > > If retrun -EAGAIN, the page will be not be put back. EAGAIN page will > be try again in migrate_pages without re-isolate. You're right. With -EGAIN, it burns out CPU pointlessly. > > > About schedule_work, as you said, we don't make sure when it happens but > > I believe it will happen in a migration iteration most of case. > > How often do you see that case? > > I noticed this issue because my Kernel patch https://lkml.org/lkml/2014/5/28/113 > that will remove retry in __alloc_contig_migrate_range. > This retry willhandle the -EBUSY because it will re-isolate the page > and re-call migrate_pages. > Without it will make cma_alloc fail at once with -EBUSY. LKML.org server is not responding so hard to see patch you mentioned but I just got your point now so I don't care any more. Your patch is enough simple as considering the benefit. Just look at below comment. > > > > >> > >> And I didn't find anything that make zs_page_migrate cannot work with > >> a ZS_EMPTY zspage. > >> So make the patch to not check inuse if migrate_mode is not > >> MIGRATE_ASYNC. > > > > At a first glance, I think it work but the question is that it a same problem > > ith schedule_work of zs_page_putback. IOW, Until the work is done, compaction > > cannot succeed. Do you have any number before and after? > > > > > Following is what I got with highalloc-performance in a vbox with 2 > cpu 1G memory 512 zram as swap: > ori afte > orig after > Minor Faults 50805113 50801261 > Major Faults 43918 46692 > Swap Ins 42087 46299 > Swap Outs 89718 105495 > Allocation stalls 0 0 > DMA allocs 57787 69787 > DMA32 allocs 47964599 47983772 > Normal allocs 0 0 > Movable allocs 0 0 > Direct pages scanned 45493 28837 > Kswapd pages scanned 1565222 1512947 > Kswapd pages reclaimed 1342222 1334030 > Direct pages reclaimed 45615 30174 > Kswapd efficiency 85% 88% > Kswapd velocity 1897.101 1708.309 > Direct efficiency 100% 104% > Direct velocity 55.139 32.561 > Percentage direct scans 2% 1% > Zone normal velocity 1952.240 1740.870 > Zone dma32 velocity 0.000 0.000 > Zone dma velocity 0.000 0.000 > Page writes by reclaim 89764.000 106043.000 > Page writes file 46 548 > Page writes anon 89718 105495 > Page reclaim immediate 21457 7269 > Sector Reads 3259688 3144160 > Sector Writes 3667252 3675528 > Page rescued immediate 0 0 > Slabs scanned 1042872 1035438 > Direct inode steals 8042 7772 > Kswapd inode steals 54295 55075 > Kswapd skipped wait 0 0 > THP fault alloc 175 200 > THP collapse alloc 226 363 > THP splits 0 0 > THP fault fallback 11 1 > THP collapse fail 3 1 > Compaction stalls 536 647 > Compaction success 322 384 > Compaction failures 214 263 > Page migrate success 119608 127002 > Page migrate failure 2723 2309 > Compaction pages isolated 250179 265318 > Compaction migrate scanned 9131832 9351314 > Compaction free scanned 2093272 3059014 > Compaction cost 192 202 > NUMA alloc hit 47124555 47086375 > NUMA alloc miss 0 0 > NUMA interleave hit 0 0 > NUMA alloc local 47124555 47086375 > NUMA base PTE updates 0 0 > NUMA huge PMD updates 0 0 > NUMA page range updates 0 0 > NUMA hint faults 0 0 > NUMA hint local faults 0 0 > NUMA hint local percent 100 100 > NUMA pages migrated 0 0 > AutoNUMA cost 0% 0% > > It looks Page migrate success is increased. > > Thanks, > Hui > > > > Thanks. > > > >> > >> Signed-off-by: Hui Zhu <zhuhui@xxxxxxxxxx> > >> --- > >> mm/zsmalloc.c | 66 +++++++++++++++++++++++++++++++++-------------------------- > >> 1 file changed, 37 insertions(+), 29 deletions(-) > >> > >> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > >> index d41edd2..c298e5c 100644 > >> --- a/mm/zsmalloc.c > >> +++ b/mm/zsmalloc.c > >> @@ -1982,6 +1982,7 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, > >> unsigned long old_obj, new_obj; > >> unsigned int obj_idx; > >> int ret = -EAGAIN; > >> + int inuse; > >> > >> VM_BUG_ON_PAGE(!PageMovable(page), page); > >> VM_BUG_ON_PAGE(!PageIsolated(page), page); > >> @@ -1996,21 +1997,24 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, > >> offset = get_first_obj_offset(page); > >> > >> spin_lock(&class->lock); > >> - if (!get_zspage_inuse(zspage)) { > >> + inuse = get_zspage_inuse(zspage); > >> + if (mode == MIGRATE_ASYNC && !inuse) { > >> ret = -EBUSY; > >> goto unlock_class; > >> } > >> > >> pos = offset; > >> s_addr = kmap_atomic(page); > >> - while (pos < PAGE_SIZE) { > >> - head = obj_to_head(page, s_addr + pos); > >> - if (head & OBJ_ALLOCATED_TAG) { > >> - handle = head & ~OBJ_ALLOCATED_TAG; > >> - if (!trypin_tag(handle)) > >> - goto unpin_objects; > >> + if (inuse) { I don't want to add inuse check for every loop. It might avoid unncessary looping in every loop of zs_page_migrate so it is for optimization, not correction. As I consider it would happen rarely, I think we don't need to add the check. Could you just remove get_zspage_inuse check, instead? like this. diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 013eea76685e..2d3d75fb0f16 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1980,14 +1980,9 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, pool = mapping->private_data; class = pool->size_class[class_idx]; offset = get_first_obj_offset(page); + pos = offset; spin_lock(&class->lock); - if (!get_zspage_inuse(zspage)) { - ret = -EBUSY; - goto unlock_class; - } - - pos = offset; s_addr = kmap_atomic(page); while (pos < PAGE_SIZE) { head = obj_to_head(page, s_addr + pos); > >> + while (pos < PAGE_SIZE) { > >> + head = obj_to_head(page, s_addr + pos); > >> + if (head & OBJ_ALLOCATED_TAG) { > >> + handle = head & ~OBJ_ALLOCATED_TAG; > >> + if (!trypin_tag(handle)) > >> + goto unpin_objects; > >> + } > >> + pos += class->size; > >> } > >> - pos += class->size; > >> } > >> > >> /* > >> @@ -2020,20 +2024,22 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, > >> memcpy(d_addr, s_addr, PAGE_SIZE); > >> kunmap_atomic(d_addr); > >> > >> - for (addr = s_addr + offset; addr < s_addr + pos; > >> - addr += class->size) { > >> - head = obj_to_head(page, addr); > >> - if (head & OBJ_ALLOCATED_TAG) { > >> - handle = head & ~OBJ_ALLOCATED_TAG; > >> - if (!testpin_tag(handle)) > >> - BUG(); > >> - > >> - old_obj = handle_to_obj(handle); > >> - obj_to_location(old_obj, &dummy, &obj_idx); > >> - new_obj = (unsigned long)location_to_obj(newpage, > >> - obj_idx); > >> - new_obj |= BIT(HANDLE_PIN_BIT); > >> - record_obj(handle, new_obj); > >> + if (inuse) { > >> + for (addr = s_addr + offset; addr < s_addr + pos; > >> + addr += class->size) { > >> + head = obj_to_head(page, addr); > >> + if (head & OBJ_ALLOCATED_TAG) { > >> + handle = head & ~OBJ_ALLOCATED_TAG; > >> + if (!testpin_tag(handle)) > >> + BUG(); > >> + > >> + old_obj = handle_to_obj(handle); > >> + obj_to_location(old_obj, &dummy, &obj_idx); > >> + new_obj = (unsigned long) > >> + location_to_obj(newpage, obj_idx); > >> + new_obj |= BIT(HANDLE_PIN_BIT); > >> + record_obj(handle, new_obj); > >> + } > >> } > >> } > >> > >> @@ -2055,14 +2061,16 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, > >> > >> ret = MIGRATEPAGE_SUCCESS; > >> unpin_objects: > >> - for (addr = s_addr + offset; addr < s_addr + pos; > >> + if (inuse) { > >> + for (addr = s_addr + offset; addr < s_addr + pos; > >> addr += class->size) { > >> - head = obj_to_head(page, addr); > >> - if (head & OBJ_ALLOCATED_TAG) { > >> - handle = head & ~OBJ_ALLOCATED_TAG; > >> - if (!testpin_tag(handle)) > >> - BUG(); > >> - unpin_tag(handle); > >> + head = obj_to_head(page, addr); > >> + if (head & OBJ_ALLOCATED_TAG) { > >> + handle = head & ~OBJ_ALLOCATED_TAG; > >> + if (!testpin_tag(handle)) > >> + BUG(); > >> + unpin_tag(handle); > >> + } > >> } > >> } > >> kunmap_atomic(s_addr); > >> -- > >> 1.9.1 > >> > >> -- > >> To unsubscribe, send a message with 'unsubscribe linux-mm' in > >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, > >> see: http://www.linux-mm.org/ . > >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>