2017-07-20 16:47 GMT+08:00 Minchan Kim <minchan@xxxxxxxxxx>: > Hi Hui, > > On Thu, Jul 20, 2017 at 02:39:17PM +0800, Hui Zhu wrote: >> Hi Minchan, >> >> I am sorry for answer late. >> I spent some time on ubuntu 16.04 with mmtests in an old laptop. >> >> 2017-07-17 13:39 GMT+08:00 Minchan Kim <minchan@xxxxxxxxxx>: >> > Hello Hui, >> > >> > On Fri, Jul 14, 2017 at 03:51:07PM +0800, Hui Zhu wrote: >> >> Got some -EBUSY from zs_page_migrate that will make migration >> >> slow (retry) or fail (zs_page_putback will schedule_work free_work, >> >> but it cannot ensure the success). >> > >> > I think EAGAIN(migration retrial) is better than EBUSY(bailout) because >> > expectation is that zsmalloc will release the empty zs_page soon so >> > at next retrial, it will be succeeded. >> >> >> I am not sure. >> >> This is the call trace of zs_page_migrate: >> zs_page_migrate >> mapping->a_ops->migratepage >> move_to_new_page >> __unmap_and_move >> unmap_and_move >> migrate_pages >> >> In unmap_and_move will remove page from migration page list >> and call putback_movable_page(will call mapping->a_ops->putback_page) if >> return value of zs_page_migrate is not -EAGAIN. >> The comments of this part: >> After called mapping->a_ops->putback_page, zsmalloc can free the page >> from ZS_EMPTY list. >> >> If retrun -EAGAIN, the page will be not be put back. EAGAIN page will >> be try again in migrate_pages without re-isolate. > > You're right. With -EGAIN, it burns out CPU pointlessly. > >> >> > About schedule_work, as you said, we don't make sure when it happens but >> > I believe it will happen in a migration iteration most of case. >> > How often do you see that case? >> >> I noticed this issue because my Kernel patch https://lkml.org/lkml/2014/5/28/113 >> that will remove retry in __alloc_contig_migrate_range. >> This retry willhandle the -EBUSY because it will re-isolate the page >> and re-call migrate_pages. >> Without it will make cma_alloc fail at once with -EBUSY. > > LKML.org server is not responding so hard to see patch you mentioned > but I just got your point now so I don't care any more. Your patch is > enough simple as considering the benefit. > Just look at below comment. > >> >> > >> >> >> >> And I didn't find anything that make zs_page_migrate cannot work with >> >> a ZS_EMPTY zspage. >> >> So make the patch to not check inuse if migrate_mode is not >> >> MIGRATE_ASYNC. >> > >> > At a first glance, I think it work but the question is that it a same problem >> > ith schedule_work of zs_page_putback. IOW, Until the work is done, compaction >> > cannot succeed. Do you have any number before and after? >> > >> >> >> Following is what I got with highalloc-performance in a vbox with 2 >> cpu 1G memory 512 zram as swap: >> ori afte >> orig after >> Minor Faults 50805113 50801261 >> Major Faults 43918 46692 >> Swap Ins 42087 46299 >> Swap Outs 89718 105495 >> Allocation stalls 0 0 >> DMA allocs 57787 69787 >> DMA32 allocs 47964599 47983772 >> Normal allocs 0 0 >> Movable allocs 0 0 >> Direct pages scanned 45493 28837 >> Kswapd pages scanned 1565222 1512947 >> Kswapd pages reclaimed 1342222 1334030 >> Direct pages reclaimed 45615 30174 >> Kswapd efficiency 85% 88% >> Kswapd velocity 1897.101 1708.309 >> Direct efficiency 100% 104% >> Direct velocity 55.139 32.561 >> Percentage direct scans 2% 1% >> Zone normal velocity 1952.240 1740.870 >> Zone dma32 velocity 0.000 0.000 >> Zone dma velocity 0.000 0.000 >> Page writes by reclaim 89764.000 106043.000 >> Page writes file 46 548 >> Page writes anon 89718 105495 >> Page reclaim immediate 21457 7269 >> Sector Reads 3259688 3144160 >> Sector Writes 3667252 3675528 >> Page rescued immediate 0 0 >> Slabs scanned 1042872 1035438 >> Direct inode steals 8042 7772 >> Kswapd inode steals 54295 55075 >> Kswapd skipped wait 0 0 >> THP fault alloc 175 200 >> THP collapse alloc 226 363 >> THP splits 0 0 >> THP fault fallback 11 1 >> THP collapse fail 3 1 >> Compaction stalls 536 647 >> Compaction success 322 384 >> Compaction failures 214 263 >> Page migrate success 119608 127002 >> Page migrate failure 2723 2309 >> Compaction pages isolated 250179 265318 >> Compaction migrate scanned 9131832 9351314 >> Compaction free scanned 2093272 3059014 >> Compaction cost 192 202 >> NUMA alloc hit 47124555 47086375 >> NUMA alloc miss 0 0 >> NUMA interleave hit 0 0 >> NUMA alloc local 47124555 47086375 >> NUMA base PTE updates 0 0 >> NUMA huge PMD updates 0 0 >> NUMA page range updates 0 0 >> NUMA hint faults 0 0 >> NUMA hint local faults 0 0 >> NUMA hint local percent 100 100 >> NUMA pages migrated 0 0 >> AutoNUMA cost 0% 0% >> >> It looks Page migrate success is increased. >> >> Thanks, >> Hui >> >> >> > Thanks. >> > >> >> >> >> Signed-off-by: Hui Zhu <zhuhui@xxxxxxxxxx> >> >> --- >> >> mm/zsmalloc.c | 66 +++++++++++++++++++++++++++++++++-------------------------- >> >> 1 file changed, 37 insertions(+), 29 deletions(-) >> >> >> >> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c >> >> index d41edd2..c298e5c 100644 >> >> --- a/mm/zsmalloc.c >> >> +++ b/mm/zsmalloc.c >> >> @@ -1982,6 +1982,7 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, >> >> unsigned long old_obj, new_obj; >> >> unsigned int obj_idx; >> >> int ret = -EAGAIN; >> >> + int inuse; >> >> >> >> VM_BUG_ON_PAGE(!PageMovable(page), page); >> >> VM_BUG_ON_PAGE(!PageIsolated(page), page); >> >> @@ -1996,21 +1997,24 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, >> >> offset = get_first_obj_offset(page); >> >> >> >> spin_lock(&class->lock); >> >> - if (!get_zspage_inuse(zspage)) { >> >> + inuse = get_zspage_inuse(zspage); >> >> + if (mode == MIGRATE_ASYNC && !inuse) { >> >> ret = -EBUSY; >> >> goto unlock_class; >> >> } >> >> >> >> pos = offset; >> >> s_addr = kmap_atomic(page); >> >> - while (pos < PAGE_SIZE) { >> >> - head = obj_to_head(page, s_addr + pos); >> >> - if (head & OBJ_ALLOCATED_TAG) { >> >> - handle = head & ~OBJ_ALLOCATED_TAG; >> >> - if (!trypin_tag(handle)) >> >> - goto unpin_objects; >> >> + if (inuse) { > > I don't want to add inuse check for every loop. It might avoid unncessary > looping in every loop of zs_page_migrate so it is for optimization, not > correction. As I consider it would happen rarely, I think we don't need > to add the check. Could you just remove get_zspage_inuse check, instead? > > like this. > > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 013eea76685e..2d3d75fb0f16 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1980,14 +1980,9 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, > pool = mapping->private_data; > class = pool->size_class[class_idx]; > offset = get_first_obj_offset(page); > + pos = offset; > > spin_lock(&class->lock); > - if (!get_zspage_inuse(zspage)) { > - ret = -EBUSY; > - goto unlock_class; > - } > - > - pos = offset; > s_addr = kmap_atomic(page); > while (pos < PAGE_SIZE) { > head = obj_to_head(page, s_addr + pos); > > What about set pos to avoid the loops? @@ -1997,8 +1997,10 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, spin_lock(&class->lock); if (!get_zspage_inuse(zspage)) { - ret = -EBUSY; - goto unlock_class; + /* The page is empty. + Set "offset" to the end of page. + Then the loops of page will be avoided. */ + offset = PAGE_SIZE; } Thanks, Hui > >> >> + while (pos < PAGE_SIZE) { >> >> + head = obj_to_head(page, s_addr + pos); >> >> + if (head & OBJ_ALLOCATED_TAG) { >> >> + handle = head & ~OBJ_ALLOCATED_TAG; >> >> + if (!trypin_tag(handle)) >> >> + goto unpin_objects; >> >> + } >> >> + pos += class->size; >> >> } >> >> - pos += class->size; >> >> } >> >> >> >> /* >> >> @@ -2020,20 +2024,22 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, >> >> memcpy(d_addr, s_addr, PAGE_SIZE); >> >> kunmap_atomic(d_addr); >> >> >> >> - for (addr = s_addr + offset; addr < s_addr + pos; >> >> - addr += class->size) { >> >> - head = obj_to_head(page, addr); >> >> - if (head & OBJ_ALLOCATED_TAG) { >> >> - handle = head & ~OBJ_ALLOCATED_TAG; >> >> - if (!testpin_tag(handle)) >> >> - BUG(); >> >> - >> >> - old_obj = handle_to_obj(handle); >> >> - obj_to_location(old_obj, &dummy, &obj_idx); >> >> - new_obj = (unsigned long)location_to_obj(newpage, >> >> - obj_idx); >> >> - new_obj |= BIT(HANDLE_PIN_BIT); >> >> - record_obj(handle, new_obj); >> >> + if (inuse) { >> >> + for (addr = s_addr + offset; addr < s_addr + pos; >> >> + addr += class->size) { >> >> + head = obj_to_head(page, addr); >> >> + if (head & OBJ_ALLOCATED_TAG) { >> >> + handle = head & ~OBJ_ALLOCATED_TAG; >> >> + if (!testpin_tag(handle)) >> >> + BUG(); >> >> + >> >> + old_obj = handle_to_obj(handle); >> >> + obj_to_location(old_obj, &dummy, &obj_idx); >> >> + new_obj = (unsigned long) >> >> + location_to_obj(newpage, obj_idx); >> >> + new_obj |= BIT(HANDLE_PIN_BIT); >> >> + record_obj(handle, new_obj); >> >> + } >> >> } >> >> } >> >> >> >> @@ -2055,14 +2061,16 @@ int zs_page_migrate(struct address_space *mapping, struct page *newpage, >> >> >> >> ret = MIGRATEPAGE_SUCCESS; >> >> unpin_objects: >> >> - for (addr = s_addr + offset; addr < s_addr + pos; >> >> + if (inuse) { >> >> + for (addr = s_addr + offset; addr < s_addr + pos; >> >> addr += class->size) { >> >> - head = obj_to_head(page, addr); >> >> - if (head & OBJ_ALLOCATED_TAG) { >> >> - handle = head & ~OBJ_ALLOCATED_TAG; >> >> - if (!testpin_tag(handle)) >> >> - BUG(); >> >> - unpin_tag(handle); >> >> + head = obj_to_head(page, addr); >> >> + if (head & OBJ_ALLOCATED_TAG) { >> >> + handle = head & ~OBJ_ALLOCATED_TAG; >> >> + if (!testpin_tag(handle)) >> >> + BUG(); >> >> + unpin_tag(handle); >> >> + } >> >> } >> >> } >> >> kunmap_atomic(s_addr); >> >> -- >> >> 1.9.1 >> >> >> >> -- >> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in >> >> the body to majordomo@xxxxxxxxx. For more info on Linux MM, >> >> see: http://www.linux-mm.org/ . >> >> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>