Re: [PATCH v3] epoll: use refcount to reduce ep_mutex contention

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/29/2022 1:05 AM, Paolo Abeni wrote:
Hello,

On Mon, 2022-11-28 at 16:06 -0800, Jacob Keller wrote:
@@ -217,6 +218,12 @@ struct eventpoll {
   	u64 gen;
   	struct hlist_head refs;
+ /*
+	 * usage count, protected by mtx, used together with epitem->dying to
+	 * orchestrate the disposal of this struct
+	 */
+	unsigned int refcount;
+

Why not use a kref (or at least struct refcount?) those provide some
guarantees like guaranteeing atomic operations and saturation when the
refcount value would overflow.

Thank you for the feedback!

I thought about that options and ultimately opted otherwise because we
can avoid the additional atomic operations required by kref/refcount_t.
The reference count is always touched under the ep->mtx mutex.
Reasonably this does not introduce performance regressions even in the
stranger corner case.

The above was explicitly noted in the previous revisions commit
message, but I removed it by mistake while updating the message for v3.

I can switch to kref if there is agreement WRT such performance trade-
off. Another option would be adding a couple of additional checks for
wrap-arounds in ep_put() and ep_get() - so that we get similar safety
guarantees but no additional atomic operations.


If its already been assessed and discarded for a good reason then I don't have a problem with sticking with this version. I didn't see any reference to this before and krefs feel like the natural choice for refcounting since it seems easier to follow getting the implementation correct.

   #ifdef CONFIG_NET_RX_BUSY_POLL
   	/* used to track busy poll napi_id */
   	unsigned int napi_id;
@@ -240,9 +247,7 @@ struct ep_pqueue {
   /* Maximum number of epoll watched descriptors, per user */
   static long max_user_watches __read_mostly;
-/*
- * This mutex is used to serialize ep_free() and eventpoll_release_file().
- */
+/* Used for cycles detection */
   static DEFINE_MUTEX(epmutex);
static u64 loop_check_gen = 0;
@@ -555,8 +560,7 @@ static void ep_remove_wait_queue(struct eppoll_entry *pwq)
/*
    * This function unregisters poll callbacks from the associated file
- * descriptor.  Must be called with "mtx" held (or "epmutex" if called from
- * ep_free).
+ * descriptor.  Must be called with "mtx" held.
    */
   static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
   {
@@ -679,11 +683,38 @@ static void epi_rcu_free(struct rcu_head *head)
   	kmem_cache_free(epi_cache, epi);
   }
+static void ep_get(struct eventpoll *ep)
+{
+	ep->refcount++;
+}
This would become something like "kref_get(&ep->kref)" or maybe even
something like "kref_get_unless_zero" or some other form depending on
exactly how you acquire a pointer to an eventpoll structure.

No need for kref_get_unless_zero here, in all ep_get() call-sites we
know that at least onother reference is alive and can't go away
concurrently.


Fair, yea.

+
+/*
+ * Returns true if the event poll can be disposed
+ */
+static bool ep_put(struct eventpoll *ep)
+{
+	if (--ep->refcount)
+		return false;
+
+	WARN_ON_ONCE(!RB_EMPTY_ROOT(&ep->rbr.rb_root));
+	return true;
+}

This could become kref_put(&ep->kref, ep_dispose).

I think it would be necessary releasing the ep->mtx mutex before
invoking ep_dispose()...


Yes you'd have to refactor things a bit since ep_dispose wouldn't actually get called until all outstanding references get dropped.

+
+static void ep_dispose(struct eventpoll *ep)
+{
+	mutex_destroy(&ep->mtx);
+	free_uid(ep->user);
+	wakeup_source_unregister(ep->ws);
+	kfree(ep);
+}
This would takea  kref pointer, use container_of to get to the eventpoll
structure, and then perform necessary cleanup once all references drop.

The exact specific steps here and whether it would still be safe to call
mutex_destroy is a bit unclear since you typically would only call
mutex_destroy when its absolutely sure that no one has locked the mutex.

... due to the above. The current patch code ensures that ep_dispose()
is called only after the ep->mtx mutex is released.


Correct, because you use the dying flag, yep.


See Documentation/core-api/kref.rst for a better overview of the API and
how to use it safely. I suspect that with just kref you could also
safely avoid the "dying" flag as well, but I am not 100% sure.

I *think* we will still need the 'dying' flag, otherwise ep_free()
can't tell if the traversed epitems entries still held a reference to
struct eventpoll - eventpoll_release_file() and ep_free() could
potentially try to release the same reference twice and kref could
detect that race only for the last reference.

Thanks!

Paolo


Right. Switching to krefs would require some thinking to go through the various cases of what gets a ref and how.

I'm not sure I follow how that would work it sounds like something weird with the way references are being counted here. Its likely that I simply don't understand the full context.

Either way, if there is good reason to avoid kref due to cost of atomics in this case and the fact that we need the mutex anyways I suppose I have no objections. I'm not sure what the perf cost of an atomic vs a simple integer is in practice though.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux