Re: [PATCH v4 00/30] NT synchronization primitive driver

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Tue, 16 Apr 2024 17:53:45 +0200

On Tue, Apr 16, 2024 at 05:50:14PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 16, 2024 at 10:14:21AM +0200, Peter Zijlstra wrote:
> 
> > > Some aspects of the implementation may deserve particular comment:
> > > 
> > > * In the interest of performance, each object is governed only by a single
> > >   spinlock. However, NTSYNC_IOC_WAIT_ALL requires that the state of multiple
> > >   objects be changed as a single atomic operation. In order to achieve this, we
> > >   first take a device-wide lock ("wait_all_lock") any time we are going to lock
> > >   more than one object at a time.
> > > 
> > >   The maximum number of objects that can be used in a vectored wait, and
> > >   therefore the maximum that can be locked simultaneously, is 64. This number is
> > >   NT's own limit.
> 
> AFAICT:
> 
> 	spin_lock(&dev->wait_all_lock);
> 	  list_for_each_entry(entry, &obj->all_waiters, node)
> 	    for (i=0; i<count; i++)
> 	      spin_lock_nest_lock(q->entries[i].obj->lock, &dev->wait_all_lock);
> 
> Where @count <= NTSYNC_MAX_WAIT_COUNT.
> 
> So while this nests at most 65 spinlocks, there is no actual bound on
> the amount of nested lock sections in total. That is, all_waiters list
> can be grown without limits.
> 
> Can we pretty please make wait_all_lock a mutex ?

Hurmph, it's worse, you do that list walk while holding some obj->lock
spinlokc too. Still need to figure out how all that works....