Hello, On Mon, 2022-11-28 at 16:06 -0800, Jacob Keller wrote: > > @@ -217,6 +218,12 @@ struct eventpoll { > > u64 gen; > > struct hlist_head refs; > > > > + /* > > + * usage count, protected by mtx, used together with epitem->dying to > > + * orchestrate the disposal of this struct > > + */ > > + unsigned int refcount; > > + > > Why not use a kref (or at least struct refcount?) those provide some > guarantees like guaranteeing atomic operations and saturation when the > refcount value would overflow. Thank you for the feedback! I thought about that options and ultimately opted otherwise because we can avoid the additional atomic operations required by kref/refcount_t. The reference count is always touched under the ep->mtx mutex. Reasonably this does not introduce performance regressions even in the stranger corner case. The above was explicitly noted in the previous revisions commit message, but I removed it by mistake while updating the message for v3. I can switch to kref if there is agreement WRT such performance trade- off. Another option would be adding a couple of additional checks for wrap-arounds in ep_put() and ep_get() - so that we get similar safety guarantees but no additional atomic operations. > > #ifdef CONFIG_NET_RX_BUSY_POLL > > /* used to track busy poll napi_id */ > > unsigned int napi_id; > > @@ -240,9 +247,7 @@ struct ep_pqueue { > > /* Maximum number of epoll watched descriptors, per user */ > > static long max_user_watches __read_mostly; > > > > -/* > > - * This mutex is used to serialize ep_free() and eventpoll_release_file(). > > - */ > > +/* Used for cycles detection */ > > static DEFINE_MUTEX(epmutex); > > > > static u64 loop_check_gen = 0; > > @@ -555,8 +560,7 @@ static void ep_remove_wait_queue(struct eppoll_entry *pwq) > > > > /* > > * This function unregisters poll callbacks from the associated file > > - * descriptor. Must be called with "mtx" held (or "epmutex" if called from > > - * ep_free). > > + * descriptor. Must be called with "mtx" held. > > */ > > static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi) > > { > > @@ -679,11 +683,38 @@ static void epi_rcu_free(struct rcu_head *head) > > kmem_cache_free(epi_cache, epi); > > } > > > > +static void ep_get(struct eventpoll *ep) > > +{ > > + ep->refcount++; > > +} > This would become something like "kref_get(&ep->kref)" or maybe even > something like "kref_get_unless_zero" or some other form depending on > exactly how you acquire a pointer to an eventpoll structure. No need for kref_get_unless_zero here, in all ep_get() call-sites we know that at least onother reference is alive and can't go away concurrently. > > + > > +/* > > + * Returns true if the event poll can be disposed > > + */ > > +static bool ep_put(struct eventpoll *ep) > > +{ > > + if (--ep->refcount) > > + return false; > > + > > + WARN_ON_ONCE(!RB_EMPTY_ROOT(&ep->rbr.rb_root)); > > + return true; > > +} > > This could become kref_put(&ep->kref, ep_dispose). I think it would be necessary releasing the ep->mtx mutex before invoking ep_dispose()... > > + > > +static void ep_dispose(struct eventpoll *ep) > > +{ > > + mutex_destroy(&ep->mtx); > > + free_uid(ep->user); > > + wakeup_source_unregister(ep->ws); > > + kfree(ep); > > +} > This would takea kref pointer, use container_of to get to the eventpoll > structure, and then perform necessary cleanup once all references drop. > > The exact specific steps here and whether it would still be safe to call > mutex_destroy is a bit unclear since you typically would only call > mutex_destroy when its absolutely sure that no one has locked the mutex. ... due to the above. The current patch code ensures that ep_dispose() is called only after the ep->mtx mutex is released. > See Documentation/core-api/kref.rst for a better overview of the API and > how to use it safely. I suspect that with just kref you could also > safely avoid the "dying" flag as well, but I am not 100% sure. I *think* we will still need the 'dying' flag, otherwise ep_free() can't tell if the traversed epitems entries still held a reference to struct eventpoll - eventpoll_release_file() and ep_free() could potentially try to release the same reference twice and kref could detect that race only for the last reference. Thanks! Paolo