On 2021-11-10 10:54:32 [-0800], Minchan Kim wrote: > The zsmalloc has used a bit for spin_lock in zpage handle to keep > zpage object alive during several operations. However, it causes > the problem for PREEMPT_RT as well as introducing too complicated. > > This patch replaces the bit spin_lock with pool->migrate_lock > rwlock. It could make the code simple as well as zsmalloc work > under PREEMPT_RT. > > The drawback is the pool->migrate_lock is bigger granuarity than > per zpage lock so the contention would be higher than old when > both IO-related operations(i.e., zsmalloc, zsfree, zs_[map|unmap]) > and compaction(page/zpage migration) are going in parallel(*, > the migrate_lock is rwlock and IO related functions are all read > side lock so there is no contention). However, the write-side > is fast enough(dominant overhead is just page copy) so it wouldn't > affect much. If the lock granurity becomes more problem later, > we could introduce table locks based on handle as a hash value. > > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx> … > index b8b098be92fa..5d4c4d254679 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1789,6 +1767,11 @@ static void migrate_write_lock(struct zspage *zspage) > write_lock(&zspage->lock); > } > > +static void migrate_write_lock_nested(struct zspage *zspage) > +{ > + write_lock_nested(&zspage->lock, SINGLE_DEPTH_NESTING); I don't have this in my tree. > +} > + > static void migrate_write_unlock(struct zspage *zspage) > { > write_unlock(&zspage->lock); … > @@ -2077,8 +2043,13 @@ static unsigned long __zs_compact(struct zs_pool *pool, > struct zspage *dst_zspage = NULL; > unsigned long pages_freed = 0; > > + /* protect the race between zpage migration and zs_free */ > + write_lock(&pool->migrate_lock); > + /* protect zpage allocation/free */ > spin_lock(&class->lock); > while ((src_zspage = isolate_zspage(class, true))) { > + /* protect someone accessing the zspage(i.e., zs_map_object) */ > + migrate_write_lock(src_zspage); > > if (!zs_can_compact(class)) > break; > @@ -2087,6 +2058,8 @@ static unsigned long __zs_compact(struct zs_pool *pool, > cc.s_page = get_first_page(src_zspage); > > while ((dst_zspage = isolate_zspage(class, false))) { > + migrate_write_lock_nested(dst_zspage); > + > cc.d_page = get_first_page(dst_zspage); > /* > * If there is no more space in dst_page, resched Looking at the these two chunks, the page here comes from a list, you remove that page from that list and this ensures that you can't lock the very same pages in reverse order as in: migrate_write_lock(dst_zspage); … migrate_write_lock(src_zspage); right? Sebastian