On (25/02/12 17:14), Yosry Ahmed wrote: > On Wed, Feb 12, 2025 at 03:27:10PM +0900, Sergey Senozhatsky wrote: > > Switch over from rwlock_t to a atomic_t variable that takes negative > > value when the page is under migration, or positive values when the > > page is used by zsmalloc users (object map, etc.) Using a rwsem > > per-zspage is a little too memory heavy, a simple atomic_t should > > suffice. > > We should also explain that rwsem cannot be used due to the locking > context (we need to hold it in an atomic context). Basically what you > explained to me before :) > > > zspage lock is a leaf lock for zs_map_object(), where it's read-acquired. > > Since this lock now permits preemption extra care needs to be taken when > > it is write-acquired - all writers grab it in atomic context, so they > > cannot spin and wait for (potentially preempted) reader to unlock zspage. > > There are only two writers at this moment - migration and compaction. In > > both cases we use write-try-lock and bail out if zspage is read locked. > > Writers, on the other hand, never get preempted, so readers can spin > > waiting for the writer to unlock zspage. > > The details are important, but I think we want to concisely state the > problem statement either before or after. Basically we want a lock that > we *never* sleep while acquiring but *can* sleep while holding in read > mode. This allows holding the lock from any context, but also being > preemptible if the context allows it. Ack. [..] > > +/* > > + * zspage locking rules: > > Also here we need to state our key rule: > Never sleep while acquiring, preemtible while holding (if possible). The > following rules are basically how we make sure we keep this true. > > > + * > > + * 1) writer-lock is exclusive > > + * > > + * 2) writer-lock owner cannot sleep > > + * > > + * 3) writer-lock owner cannot spin waiting for the lock > > + * - caller (e.g. compaction and migration) must check return value and > > + * handle locking failures > > + * - there is only TRY variant of writer-lock function > > + * > > + * 4) reader-lock owners (multiple) can sleep > > + * > > + * 5) reader-lock owners can spin waiting for the lock, in any context > > + * - existing readers (even preempted ones) don't block new readers > > + * - writer-lock owners never sleep, always unlock at some point > > > May I suggest something more concise and to the point? > > /* > * The zspage lock can be held from atomic contexts, but it needs to remain > * preemptible when held for reading because it remains held outside of those > * atomic contexts, otherwise we unnecessarily lose preemptibility. > * > * To achieve this, the following rules are enforced on readers and writers: > * > * - Writers are blocked by both writers and readers, while readers are only > * blocked by writers (i.e. normal rwlock semantics). > * > * - Writers are always atomic (to allow readers to spin waiting for them). > * > * - Writers always use trylock (as the lock may be held be sleeping readers). > * > * - Readers may spin on the lock (as they can only wait for atomic writers). > * > * - Readers may sleep while holding the lock (as writes only use trylock). > */ Looks good, thanks. > > + */ > > +static void zspage_read_lock(struct zspage *zspage) > > +{ > > + atomic_t *lock = &zspage->lock; > > + int old = atomic_read_acquire(lock); > > + > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC > > + rwsem_acquire_read(&zspage->lockdep_map, 0, 0, _RET_IP_); > > +#endif > > + > > + do { > > + if (old == ZS_PAGE_WRLOCKED) { > > + cpu_relax(); > > + old = atomic_read_acquire(lock); > > + continue; > > + } > > + } while (!atomic_try_cmpxchg_acquire(lock, &old, old + 1)); > > +} > > + > > +static void zspage_read_unlock(struct zspage *zspage) > > +{ > > +#ifdef CONFIG_DEBUG_LOCK_ALLOC > > + rwsem_release(&zspage->lockdep_map, _RET_IP_); > > +#endif > > + atomic_dec_return_release(&zspage->lock); > > +} > > + > > +static __must_check bool zspage_try_write_lock(struct zspage *zspage) > > I believe zspage_write_trylock() would be closer to the normal rwlock > naming. It derived its name from rwsem "age". Can rename. > > +{ > > + atomic_t *lock = &zspage->lock; > > + int old = ZS_PAGE_UNLOCKED; > > + > > + WARN_ON_ONCE(preemptible()); > > Hmm I know I may have been the one suggesting this, but do we actually > need it? We disable preemption explicitly anyway before holding the > lock. This is just to make sure that the precondition for "writer is always atomic" is satisfied. But I can drop it. > > size_class_lock(class); > > - /* the migrate_write_lock protects zpage access via zs_map_object */ > > - migrate_write_lock(zspage); > > + /* the zspage write_lock protects zpage access via zs_map_object */ > > + if (!zspage_try_write_lock(zspage)) { > > + size_class_unlock(class); > > + pool_write_unlock(pool); > > + return -EINVAL; > > + } > > + > > + /* We're committed, tell the world that this is a Zsmalloc page. */ > > + __zpdesc_set_zsmalloc(newzpdesc); > > We used to do this earlier on, before any locks are held. Why is it > moved here? I want to do that only if zspaage write-trylock has succeeded (we didn't have any error out paths before).