On Mon, Apr 20, 2020 at 05:35:20PM +0200, Manfred Spraul wrote: > > - max_idx = max(ids->in_use*3/2, ipc_min_cycle); > > - max_idx = min(max_idx, ipc_mni); > > - > > - /* allocate the idx, with a NULL struct kern_ipc_perm */ > > - idx = idr_alloc_cyclic(&ids->ipcs_idr, NULL, 0, max_idx, > > - GFP_NOWAIT); > > - > > - if (idx >= 0) { > > - /* > > - * idx got allocated successfully. > > - * Now calculate the sequence number and set the > > - * pointer for real. > > - */ > > - if (idx <= ids->last_idx) { > > + min_idx = ids->next_idx; > > + new->seq = ids->seq; > > + > > + /* Modified version of __xa_alloc */ > > + do { > > + xas.xa_index = min_idx; > > + xas_find_marked(&xas, max_idx, XA_FREE_MARK); > > + if (xas.xa_node == XAS_RESTART && min_idx > 0) { > > ids->seq++; > > if (ids->seq >= ipcid_seq_max()) > > ids->seq = 0; > > + new->seq = ids->seq; > > + xas.xa_index = 0; > > + min_idx = 0; > > + xas_find_marked(&xas, max_idx, XA_FREE_MARK); > > } > > Is is nessary to have that many details of xarray in ipc/util? > > This function is not performance critical. > > The core requirement is that ipc_obtain_object_check() must scale. > > Would it be possible to use something like > > xa_alloc(,entry=NULL,) > > new->seq = ... > > xa_store(,entry=new,); Yes, that would absolutely be possible, and some users do exactly that. I thought that creating a new message queue / semaphore set / shared memory segment would be performance sensitive. > > - ids->last_idx = idx; > > - > > - new->seq = ids->seq; > > - /* no need for smp_wmb(), this is done > > - * inside idr_replace, as part of > > - * rcu_assign_pointer > > - */ > > Could you leave the memory barrier comments in the code? > > The rcu_assign_pointer() is the first hands-off from semget() or msgget(). > > Before the rcu_assign_pointer, e.g. semop() calls would return -EINVAL; > > After the rcu_assign_pointer, semwhatever() must work - and e.g. the > permission checks are lockless. How about simply: /* xa_store contains a memory barrier */ > > - idr_replace(&ids->ipcs_idr, new, idx); > > - } > > + if (xas.xa_node == XAS_RESTART) > > + xas_set_err(&xas, -ENOSPC); > > + else > > + new->id = (new->seq << ipcmni_seq_shift()) + > > + xas.xa_index; > > Setting new->id should remain at the end, outside any locking: > > The variable has no special protection, access is only allowed after proper > locking, thus no need to have the initialization in the middle. > > What is crucial is that the final value of new->seq is visible to all cpus > before a storing the pointer. The IPC locking is weird. Most users spin_lock the xarray/idr/radix tree for modifications, and on the read-side use RCU to protect the lookup and a refcount on the object looked up in it (after which, RCU is unlocked). IPC seems to hold the RCU lock much longer, and it has a per-object spinlock rather than refcount. And it has an rwsem. It feels like it could be much simpler, but I haven't had time to dig into it and figure out why it's so different from all the other users. Maybe it's just older code. > > + xas_store(&xas, new); > > + xas_clear_mark(&xas, XA_FREE_MARK); > > + } while (__xas_nomem(&xas, GFP_KERNEL)); > > + > > Just for my curiosity: > > If the xas is in an invalid state, then xas_store() will not store anything. > Thus the loop will not store "new" multiple times, it will be stored only > once. Correct, although we're going to delete this loop entirely. > @@ -472,7 +487,7 @@ void ipc_rmid(struct ipc_ids *ids, struct kern_ipc_perm > *ipcp) > > idx--; > > if (idx == -1) > > break; > > - } while (!idr_find(&ids->ipcs_idr, idx)); > > + } while (!xa_load(&ids->ipcs, idx)); > > ids->max_idx = idx; > > } > > } > > Is there an xa_find_last() function? > > It is outside of any hot path, I have a patch that does a binary search with > idr_get_next(). > > If there is no xa_find_last(), then I would rebase that patch. There is not currently an xa_find_last(). It shouldn't be too hard to add; start at the top of the tree and walk backwards in each node until finding a non-NULL entry. Of course, it'll need documentation and test cases.