Re: Ottawa and slow hash-table resize

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> · Mon, 23 Feb 2015 15:06:19 -0800

On Mon, Feb 23, 2015 at 05:32:52PM -0500, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>
> Date: Mon, 23 Feb 2015 13:52:49 -0800
> 
> > On Mon, Feb 23, 2015 at 09:03:58PM +0000, Thomas Graf wrote:
> >> On 02/23/15 at 11:12am, josh@xxxxxxxxxxxxxxxx wrote:
> >> > In theory, resizes should only take the locks for the buckets they're
> >> > currently unzipping, and adds should take those same locks.  Neither one
> >> > should take a whole-table lock, other than resize excluding concurrent
> >> > resizes.  Is that still insufficient?
> >> 
> >> Correct, this is what happens. The problem is basically that
> >> if we insert from atomic context we cannot slow down inserts
> >> and the table may not grow quickly enough.
> >> 
> >> > Yeah, the add/remove statistics used for tracking would need some
> >> > special handling to avoid being a table-wide bottleneck.
> >> 
> >> Daniel is working on a patch to do per-cpu element counting
> >> with a batched update cycle.
> > 
> > One approach is simply to count only when a resize operation is in
> > flight.  Another is to keep a per-bucket count, which can be summed
> > at the beginning of the next resize operation.
> 
> I think we should think exactly about what we should do when someone
> loops non-stop adding 1 million entries to the hash table and the
> initial table size is very small.
> 
> This is a common use case for at least one of the current rhashtable
> users (nft_hash).  When you load an nftables rule with a large set
> of IP addresses attached, this is what happens.
> 
> Yes I understand that nftables could give a hint and start with a
> larger hash size from the start when it knows this is going to happen,
> but I still believe that we should behave reasonably when starting
> from a small table.
> 
> I'd say that with the way things work right now, in this situation it
> actually hurts to allow asynchronous inserts during a resize.  Because
> we end up with extremely long hash table chains, and thus make the
> resize work and the lookups both take an excruciatingly long amount of
> time to complete.
> 
> I just did a quick scan of all code paths that do inserts into an
> rhashtable, and it seems like all of them can easily block.  So why
> don't we do that?  Make inserts sleep on an rhashtable expansion
> waitq.
> 
> There could even be a counter of pending inserts, so the expander can
> decide to expand further before waking the inserting threads up.

Should be reasonably simple, and certainly seems worth a try!

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html