On (25/02/04 17:19), Yosry Ahmed wrote: > > sizeof(struct zs_page) change is one thing. Another thing is that > > zspage->lock is taken from atomic sections, pretty much everywhere. > > compaction/migration write-lock it under pool rwlock and class spinlock, > > but both compaction and migration now EAGAIN if the lock is locked > > already, so that is sorted out. > > > > The remaining problem is map(), which takes zspage read-lock under pool > > rwlock. RFC series (which you hated with passion :P) converted all zsmalloc > > into preemptible ones because of this - zspage->lock is a nested leaf-lock, > > so it cannot schedule unless locks it's nested under permit it (needless to > > say neither rwlock nor spinlock permit it). > > Hmm, so we want the lock to be preemtible, but we don't want to use an > existing preemtible lock because it may be held it from atomic context. > > I think one problem here is that the lock you are introducing is a > spinning lock but the lock holder can be preempted. This is why spinning > locks do not allow preemption. Others waiting for the lock can spin > waiting for a process that is scheduled out. > > For example, the compaction/migration code could be sleeping holding the > write lock, and a map() call would spin waiting for that sleeping task. write-lock holders cannot sleep, that's the key part. So the rules are: 1) writer cannot sleep - migration/compaction runs in atomic context and grabs write-lock only from atomic context - write-locking function disables preemption before lock(), just to be safe, and enables it after unlock() 2) writer does not spin waiting - that's why there is only write_try_lock function - compaction and migration bail out when they cannot lock the zspage 3) readers can sleep and can spin waiting for a lock - other (even preempted) readers don't block new readers - writers don't sleep, they always unlock > I wonder if there's a way to rework the locking instead to avoid the > nesting. It seems like sometimes we lock the zspage with the pool lock > held, sometimes with the class lock held, and sometimes with no lock > held. > > What are the rules here for acquiring the zspage lock? Most of that code is not written by me, but I think the rule is to disable "migration" be it via pool lock or class lock. > Do we need to hold another lock just to make sure the zspage does not go > away from under us? Yes, the page cannot go away via "normal" path: zs_free(last object) -> zspage becomes empty -> free zspage so when we have active mapping() it's only migration and compaction that can free zspage (its content is migrated and so it becomes empty). > Can we use RCU or something similar to do that instead? Hmm, I don't know... zsmalloc is not "read-mostly", it's whatever data patterns the clients have. I suspect we'd need to synchronize RCU every time a zspage is freed: zs_free() [this one is complicated], or migration, or compaction? Sounds like anti-pattern for RCU?