We encountered many kernel exceptions of VM_BUG_ON(zspage->isolated == 0) in dec_zspage_isolation() and BUG_ON(!pages[1]) in zs_unmap_object() lately. This issue only occurs when migration and reclamation occur at the same time. With our memory stress test, we can reproduce this issue several times a day. We have no idea why no one else encountered this issue. BTW, we switched to the new kernel version with this defect a few months ago. Since fullness and isolated share the same unsigned int, modifications of them should be protected by the same lock. [andrew.yang@xxxxxxxxxxxx: move comment] Link: https://lkml.kernel.org/r/20230727062910.6337-1-andrew.yang@xxxxxxxxxxxx Link: https://lkml.kernel.org/r/20230721063705.11455-1-andrew.yang@xxxxxxxxxxxx Fixes: c4549b871102 ("zsmalloc: remove zspage isolation for migration") Signed-off-by: Andrew Yang <andrew.yang@xxxxxxxxxxxx> Reviewed-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx> Cc: Matthias Brugger <matthias.bgg@xxxxxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> (cherry picked from commit 4b5d1e47b69426c0f7491d97d73ad0152d02d437) --- mm/zsmalloc.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index d03941cace2c..aa1cb03ad72c 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1821,6 +1821,7 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage, static bool zs_page_isolate(struct page *page, isolate_mode_t mode) { + struct size_class *class; struct zspage *zspage; /* @@ -1831,9 +1832,10 @@ static bool zs_page_isolate(struct page *page, isolate_mode_t mode) VM_BUG_ON_PAGE(PageIsolated(page), page); zspage = get_zspage(page); - migrate_write_lock(zspage); + class = zspage_class(zspage->pool, zspage); + spin_lock(&class->lock); inc_zspage_isolation(zspage); - migrate_write_unlock(zspage); + spin_unlock(&class->lock); return true; } @@ -1909,8 +1911,8 @@ static int zs_page_migrate(struct page *newpage, struct page *page, * it's okay to release migration_lock. */ write_unlock(&pool->migrate_lock); - spin_unlock(&class->lock); dec_zspage_isolation(zspage); + spin_unlock(&class->lock); migrate_write_unlock(zspage); get_page(newpage); @@ -1927,15 +1929,17 @@ static int zs_page_migrate(struct page *newpage, struct page *page, static void zs_page_putback(struct page *page) { + struct size_class *class; struct zspage *zspage; VM_BUG_ON_PAGE(!PageMovable(page), page); VM_BUG_ON_PAGE(!PageIsolated(page), page); zspage = get_zspage(page); - migrate_write_lock(zspage); + class = zspage_class(zspage->pool, zspage); + spin_lock(&class->lock); dec_zspage_isolation(zspage); - migrate_write_unlock(zspage); + spin_unlock(&class->lock); } static const struct movable_operations zsmalloc_mops = { -- 2.18.0