Re: [PATCH bpf-next v2 00/13] Add support for qp-trie with dynptr key

Tony Finch <dot@xxxxxxxx> · Wed, 19 Oct 2022 18:01:31 +0100

Hello all,

I have just found out about this qp-trie work, and I'm pleased to hear
that it is looking promising for you!

I have a few very broad observations:

The "q" in qp-trie doesn't have to stand for "quadbit". There's a tradeoff
between branch factor, maximum key length, and size of branch node. The
greater the branch factor, the fewer indirections needed to traverse the
tree; but if you go too wide then prefetching is less effective and branch
nodes get bigger. I found that 5 bits was the sweet spot (32 wide bitmap,
30ish bit key length) - indexing 5 bit mouthfuls out of the key is HORRID
but it was measurably faster than 4 bits. 6 bits (64 bits of bitmap) grew
nodes from 16 bytes to 24 bytes, and it ended up slower.

Your interior nodes are much bigger than mine, so you might find the
tradeoff is different. I encourage you to try it out.

I saw there has been some discussion about locking and RCU. My current
project is integrating a qp-trie into BIND, with the aim of replacing its
old red-black tree for searching DNS records. It's based on a concurrent
qp-trie that I prototyped in NSD (a smaller and simpler DNS server than
BIND). My strategy is based on a custom allocator for interior nodes. This
has two main effects:

  * Node references are now 32 bit indexes into the allocator's pool,
    instead of 64 bit pointers; nodes are 12 bytes instead of 16 bytes.

  * The allocator supports copy-on-write and safe memory reclamation with
    a fairly small overhead, 3 x 32 bit counters per memory chunk (each
    chunk is roughly page sized).

I wrote some notes when the design was new, but things have changed since
then.

https://dotat.at/@/2021-06-23-page-based-gc-for-qp-trie-rcu.html

For memory reclamation the interior nodes get moved / compacted. It's a
kind of garbage collector, but easy-mode because the per-chunk counters
accurately indicate when compaction is worthwhile. I've written some notes
on my several failed GC experiments; the last / current attempt seems (by
and large) good enough.

https://dotat.at/@/2022-06-22-compact-qp.html

For exterior / leaf nodes, I'm using atomic refcounts to know when they
can be reclaimed. The caller is responsible for COWing its leaves when
necessary.

Updates to the tree are transactional in style, and do not block readers:
a single writer gets the write mutex, makes whatever changes it needs
(copying as required), then commits by flipping the tree's root. After a
commit it can free unused chunks. (Compaction can be part of an update
transaction or a transaction of its own.)

I'm currently using a reader-writer lock for the tree root, but I designed
it with liburcu in mind, while trying to keep things simple.

This strategy is very heavily biased in favour of readers, which suits DNS
servers. I don't know enough about BPF to have any idea what kind of
update traffic you need to support.

At the moment I am reworking and simplifying my transaction and
reclamation code and it's all very broken. I guess this isn't the best
possible time to compare notes on qp-trie variants, but I'm happy to hear
from others who have code and ideas to share.

-- 
Tony Finch  <dot@xxxxxxxx>  https://dotat.at/
Mull of Kintyre to Ardnamurchan Point: East or southeast 4 to 6,
increasing 6 to gale 8 for a time. Smooth or slight in eastern
shelter, otherwise slight or moderate. Rain or showers. Good,
occasionally poor.