From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> Date: Mon, 23 Feb 2015 13:52:49 -0800 > On Mon, Feb 23, 2015 at 09:03:58PM +0000, Thomas Graf wrote: >> On 02/23/15 at 11:12am, josh@xxxxxxxxxxxxxxxx wrote: >> > In theory, resizes should only take the locks for the buckets they're >> > currently unzipping, and adds should take those same locks. Neither one >> > should take a whole-table lock, other than resize excluding concurrent >> > resizes. Is that still insufficient? >> >> Correct, this is what happens. The problem is basically that >> if we insert from atomic context we cannot slow down inserts >> and the table may not grow quickly enough. >> >> > Yeah, the add/remove statistics used for tracking would need some >> > special handling to avoid being a table-wide bottleneck. >> >> Daniel is working on a patch to do per-cpu element counting >> with a batched update cycle. > > One approach is simply to count only when a resize operation is in > flight. Another is to keep a per-bucket count, which can be summed > at the beginning of the next resize operation. I think we should think exactly about what we should do when someone loops non-stop adding 1 million entries to the hash table and the initial table size is very small. This is a common use case for at least one of the current rhashtable users (nft_hash). When you load an nftables rule with a large set of IP addresses attached, this is what happens. Yes I understand that nftables could give a hint and start with a larger hash size from the start when it knows this is going to happen, but I still believe that we should behave reasonably when starting from a small table. I'd say that with the way things work right now, in this situation it actually hurts to allow asynchronous inserts during a resize. Because we end up with extremely long hash table chains, and thus make the resize work and the lookups both take an excruciatingly long amount of time to complete. I just did a quick scan of all code paths that do inserts into an rhashtable, and it seems like all of them can easily block. So why don't we do that? Make inserts sleep on an rhashtable expansion waitq. There could even be a counter of pending inserts, so the expander can decide to expand further before waking the inserting threads up. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html