On Tue, Apr 16, 2024 at 10:14:21AM +0200, Peter Zijlstra wrote: > > Some aspects of the implementation may deserve particular comment: > > > > * In the interest of performance, each object is governed only by a single > > spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of multiple > > objects be changed as a single atomic operation. In order to achieve this, we > > first take a device-wide lock ("wait_all_lock") any time we are going to lock > > more than one object at a time. > > > > The maximum number of objects that can be used in a vectored wait, and > > therefore the maximum that can be locked simultaneously, is 64. This number is > > NT's own limit. AFAICT: spin_lock(&dev->wait_all_lock); list_for_each_entry(entry, &obj->all_waiters, node) for (i=0; i<count; i++) spin_lock_nest_lock(q->entries[i].obj->lock, &dev->wait_all_lock); Where @count <= NTSYNC_MAX_WAIT_COUNT. So while this nests at most 65 spinlocks, there is no actual bound on the amount of nested lock sections in total. That is, all_waiters list can be grown without limits. Can we pretty please make wait_all_lock a mutex ? > > The acquisition of multiple spinlocks will degrade performance. This is a > > conscious choice, however. Wait-for-all is known to be a very rare operation > > in practice, especially with counts that approach the maximum, and it is the > > intent of the ntsync driver to optimize wait-for-any at the expense of > > wait-for-all as much as possible. Typical sane usage is a good guide for performance, but you must not forget about malicious userspace and what they can do on purpose to mess you up. Anyway, let me stare more at all this....