Cc-ing Sebastian and Mike On (25/01/31 14:55), Andrew Morton wrote: > > +static void zram_slot_write_lock(struct zram *zram, u32 index) > > +{ > > + atomic_t *lock = &zram->table[index].lock; > > + int old = atomic_read(lock); > > + > > + do { > > + if (old != ZRAM_ENTRY_UNLOCKED) { > > + cond_resched(); > > + old = atomic_read(lock); > > + continue; > > + } > > + } while (!atomic_try_cmpxchg(lock, &old, ZRAM_ENTRY_WRLOCKED)); > > +} > > I expect that if the calling userspace process has realtime policy (eg > SCHED_FIFO) then the cond_resched() won't schedule SCHED_NORMAL tasks > and this becomes a busy loop. And if the machine is single-CPU, the > loop is infinite. > > I do agree that for inventing new locking schemes, the bar is set > really high. So I completely reworked this bit and we don't have that problem anymore, nor the problem of "inventing locking schemes in 2025". In short - we are returning back to bit-locks, what zram has been using until commit 9518e5bfaae19 ("zram: Replace bit spinlocks with a spinlock_t), not bit-spinlock these time around, that won't work with linux-rt, but wait_on_bit and friends. Entry lock is exclusive, just like before, but lock owner can sleep now, any task wishing to lock that same entry will wait to be woken up by the current lock owner once it unlocks the entry. For cases when lock is taken from atomic context (e.g. slot-free notification from softirq) we continue using TRY lock, which has been introduced by commit 3c9959e025472 ("zram: fix lockdep warning of free block handling"), so there's nothing new here. In addition I added some lockdep annotations, just to be safe. There shouldn't be too many tasks competing for the same entry. I can only think of cases when read/write (or slot-free notification if zram is used as a swap device) vs. writeback or recompression (we cannot have writeback and recompression simultaneously). It currently looks like this: --- struct zram_table_entry { unsigned long handle; unsigned long flags; #ifdef CONFIG_DEBUG_LOCK_ALLOC struct lockdep_map lockdep_map; #endif #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME ktime_t ac_time; #endif }; /* * entry locking rules: * * 1) Lock is exclusive * * 2) lock() function can sleep waiting for the lock * * 3) Lock owner can sleep * * 4) Use TRY lock variant when in atomic context * - must check return value and handle locking failers */ static __must_check bool zram_slot_try_lock(struct zram *zram, u32 index) { unsigned long *lock = &zram->table[index].flags; if (!test_and_set_bit_lock(ZRAM_ENTRY_LOCK, lock)) { #ifdef CONFIG_DEBUG_LOCK_ALLOC mutex_acquire(&zram->table[index].lockdep_map, 0, 0, _RET_IP_); #endif return true; } return false; } static void zram_slot_lock(struct zram *zram, u32 index) { unsigned long *lock = &zram->table[index].flags; WARN_ON_ONCE(!preemptible()); wait_on_bit_lock(lock, ZRAM_ENTRY_LOCK, TASK_UNINTERRUPTIBLE); #ifdef CONFIG_DEBUG_LOCK_ALLOC mutex_acquire(&zram->table[index].lockdep_map, 0, 0, _RET_IP_); #endif } static void zram_slot_unlock(struct zram *zram, u32 index) { unsigned long *lock = &zram->table[index].flags; clear_and_wake_up_bit(ZRAM_ENTRY_LOCK, lock); #ifdef CONFIG_DEBUG_LOCK_ALLOC mutex_release(&zram->table[index].lockdep_map, _RET_IP_); #endif }