On Wed, Nov 02, 2022 at 12:42:06PM +0900, Sergey Senozhatsky wrote: > On (22/10/27 11:27), Nhat Pham wrote: > [..] > > +static int zs_reclaim_page(struct zs_pool *pool, unsigned int retries) > > +{ > > + int i, obj_idx, ret = 0; > > + unsigned long handle; > > + struct zspage *zspage; > > + struct page *page; > > + enum fullness_group fullness; > > + > > + /* Lock LRU and fullness list */ > > + spin_lock(&pool->lock); > > + if (!pool->ops || !pool->ops->evict || list_empty(&pool->lru) || > > + retries == 0) { > > + spin_unlock(&pool->lock); > > + return -EINVAL; > > + } > > + > > + for (i = 0; i < retries; i++) { > > + struct size_class *class; > > + > > + zspage = list_last_entry(&pool->lru, struct zspage, lru); > > + list_del(&zspage->lru); > > + > > + /* zs_free may free objects, but not the zspage and handles */ > > + zspage->under_reclaim = true; > > + > > + /* Lock backing pages into place */ > > + lock_zspage(zspage); > > Does this call into the scheduler under pool->lock spinlock? Good catch! We went back and checked our logs, and this didn't actually hit in our production. We also couldn't trigger it with explicit compaction. It's an easy fix, the page locks can be acquired after dropping the pool lock.