Re: Ottawa and slow hash-table resize

David Miller <davem@xxxxxxxxxxxxx> · Tue, 24 Feb 2015 12:09:44 -0500 (EST)

From: Patrick McHardy <kaber@xxxxxxxxx>
Date: Tue, 24 Feb 2015 10:39:18 +0000

> On 24.02, Thomas Graf wrote:
>> On 02/23/15 at 03:06pm, Paul E. McKenney wrote:
>> > On Mon, Feb 23, 2015 at 05:32:52PM -0500, David Miller wrote:
>> > > I just did a quick scan of all code paths that do inserts into an
>> > > rhashtable, and it seems like all of them can easily block.  So why
>> > > don't we do that?  Make inserts sleep on an rhashtable expansion
>> > > waitq.
>> > > 
>> > > There could even be a counter of pending inserts, so the expander can
>> > > decide to expand further before waking the inserting threads up.
>> > 
>> > Should be reasonably simple, and certainly seems worth a try!
>> 
>> Agreed. Definitely desirable for nft_hash. I like the pending counter
>> idea. I'm experimenting with various ideas on blocking inserts for
>> Netlink. Blocking too long might open DoS vectors as one app could
>> easily delay the creation of sockets for other applications.
> 
> Regarding nft_hash, blocking in the netlink path certainly seems fine,
> but we will soon also have inserts from the packet processing path,
> where we obviously can't block.

Indeed, I remembered last night that for TCP sockets blocking will not
work at all.

And having a flood of 1 million new TCP connections all at once
shouldn't knock us over.

Therefore, we will need to find a way to handle this problem without
being able to block on insert.

Thinking about this, if inserts occur during a pending resize, if the
nelems of the table has exceeded even the grow threshold for the new
table, it makes no sense to allow these async inserts as they are
going to make the resize take longer and prolong the pain.

On one hand I like the async resize because it means that an insert
that triggers the resize doesn't incur a huge latency spike since
it was simply unlucky to be the resize trigger event.  The async
resize smoothes out the cost of the resize across the system.

This scheme works really well if, on average, the resize operation
completes before enough subsequent inserts occur to exceed even
the resized tables resize threshold.

So I think what I'm getting at is that we can allow parallel inserts
but only up until the point where the resized tables thresholds are
exceeded.

Looking at how to implement this, I think that there is too much
configurability to this code.  There is no reason to have indirect
calls for the grow decision.  This should be a quick test, but it's
not because we go through ->grow_decision.  It should just be
rht_grow_above_75 or whatever, and inline this crap!

Nobody even uses this indirection capability, it's therefore over
engineered :-)

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html