On Fri, Apr 8, 2022 at 8:08 PM Hou Tao <houtao1@xxxxxxxxxx> wrote: > > Hi, > > On 4/7/2022 1:38 AM, Andrii Nakryiko wrote: > > On Thu, Mar 31, 2022 at 5:04 AM Hou Tao <houtao1@xxxxxxxxxx> wrote: > >> Hi, > >> > >> The initial motivation for the patchset is due to the suggestion of Alexei. > >> During the discuss of supporting of string key in hash-table, he saw the > >> space efficiency of ternary search tree under our early test and suggest > >> us to post it as a new bpf map [1]. > >> > >> Ternary search tree is a special trie where nodes are arranged in a > >> manner similar to binary search tree, but with up to three children > >> rather than two. The three children correpond to nodes whose value is > >> less than, equal to, and greater than the value of current node > >> respectively. > >> > >> In ternary search tree map, only the valid content of string is saved. > >> The trailing null byte and unused bytes after it are not saved. If there > >> are common prefixes between these strings, the prefix is only saved once. > >> Compared with other space optimized trie (e.g. HAT-trie, succinct trie), > >> the advantage of ternary search tree is simple and being writeable. > >> > >> Below are diagrams for ternary search map when inserting hello, he, > >> test and tea into it: > >> > >> 1. insert "hello" > >> > >> [ hello ] > >> > >> 2. insert "he": need split "hello" into "he" and "llo" > >> > >> [ he ] > >> | > >> * > >> | > >> [ llo ] > >> > >> 3. insert "test": add it as right child of "he" > >> > >> [ he ] > >> | > >> *-------x > >> | | > >> [ llo ] [ test ] > >> > >> 5. insert "tea": split "test" into "te" and "st", > >> and insert "a" as left child of "st" > >> > >> [ he ] > >> | > >> x------*-------x > >> | | | > >> [ ah ] [ llo ] [ te ] > >> | > >> * > >> | > >> [ st ] > >> | > >> x----* > >> | > >> [ a ] > >> > >> As showed in above diagrams, the common prefix between "test" and "tea" > >> is "te" and it only is saved once. Also add benchmarks to compare the > >> memory usage and lookup performance between ternary search tree and > >> hash table. When the common prefix is lengthy (~192 bytes) and the > >> length of suffix is about 64 bytes, there are about 2~3 folds memory > >> saving compared with hash table. But the memory saving comes at prices: > >> the lookup performance of tst is about 2~3 slower compared with hash > >> table. See more benchmark details on patch #2. > >> > >> Comments and suggestions are always welcome. > >> > > Have you heard and tried qp-trie ([0]) by any chance? It is elegant > > and simple data structure. By all the available benchmarks it handily > > beats Red-Black trees in terms of memory usage and performance (though > > it of course depends on the data set, just like "memory compression" > > for ternary tree of yours depends on large set of common prefixes). > > qp-trie based BPF map seems (at least on paper) like a better > > general-purpose BPF map that is dynamically sized (avoiding current > > HASHMAP limitations) and stores keys in sorted order (and thus allows > > meaningful ordered iteration *and*, importantly for longest prefix > > match tree, allows efficient prefix matches). I did a quick experiment > > about a month ago trying to replace libbpf's internal use of hashmap > > with qp-trie for BTF string dedup and it was slightly slower than > > hashmap (not surprisingly, though, because libbpf over-sizes hashmap > > to avoid hash collisions and long chains in buckets), but it was still > > very decent even in that scenario. So I've been mulling the idea of > > implementing BPF map based on qp-trie elegant design and ideas, but > > can't find time to do this. > I have heard about it when check the space efficient of HAT trie [0], because > qp-trie needs to save the whole string key in the leaf node and its space > efficiency can not be better than ternary search tree for strings with common > prefix, so I did not consider about it. But I will do some benchmarks to check > the lookup performance and space efficiency of qp-trie and tst for string with > common prefix and strings without much common prefix. > If qp-trie is better, I think I can take the time to post it as a bpf map if you > are OK with that. You can probably always craft a data set where prefix sharing is so prevalent that space savings are very significant. But I think for a lot of real-world data it won't be as extreme and qp-trie might be very comparable (if not more memory-efficient) due to very compact node layout (which was the point of qp-trie). So I'd be really curious to see some comparisons. Would be great if you can try both! > > > > > > This prefix sharing is nice when you have a lot of long common > > prefixes, but I'm a bit skeptical that as a general-purpose BPF data > > structure it's going to be that beneficial. 192 bytes of common > > prefixes seems like a very unusual dataset :) > Yes. The case with common prefix I known is full file path. > > More specifically about TST implementation in your paches. One global > > per-map lock I think is a very big downside. We have LPM trie which is > > very slow in big part due to global lock. It might be possible to > > design more granular schema for TST, but this whole in-place splitting > > logic makes this harder. I think qp-trie can be locked in a granular > > fashion much more easily by having a "hand over hand" locking: lock > > parent, find child, lock child, unlock parent, move into child node. > > Something like that would be more scalable overall, especially if the > > access pattern is not focused on a narrow set of nodes. > Yes. The global lock is a problem but the splitting is not in-place. I will try > to figure out whether the lock can be more scalable after the benchmark test > between qp-trie and tst. Great, looking forward! > > Regards, > Tao > > [0]: https://github.com/Tessil/hat-trie > > > > Anyways, I love data structures and this one is an interesting idea. > > But just my few cents of "production-readiness" for general-purpose > > data structures for BPF. > > > > [0] https://dotat.at/prog/qp/README.html > > > >> Regards, > >> Tao > >> > >> [1]: https://lore.kernel.org/bpf/CAADnVQJUJp3YBcpESwR3Q1U6GS1mBM=Vp-qYuQX7eZOaoLjdUA@xxxxxxxxxxxxxx/ > >> > >> Hou Tao (2): > >> bpf: Introduce ternary search tree for string key > >> selftests/bpf: add benchmark for ternary search tree map > >> > >> include/linux/bpf_types.h | 1 + > >> include/uapi/linux/bpf.h | 1 + > >> kernel/bpf/Makefile | 1 + > >> kernel/bpf/bpf_tst.c | 411 +++++++++++++++++ > >> tools/include/uapi/linux/bpf.h | 1 + > >> tools/testing/selftests/bpf/Makefile | 5 +- > >> tools/testing/selftests/bpf/bench.c | 6 + > >> .../selftests/bpf/benchs/bench_tst_map.c | 415 ++++++++++++++++++ > >> .../selftests/bpf/benchs/run_bench_tst.sh | 54 +++ > >> tools/testing/selftests/bpf/progs/tst_bench.c | 70 +++ > >> 10 files changed, 964 insertions(+), 1 deletion(-) > >> create mode 100644 kernel/bpf/bpf_tst.c > >> create mode 100644 tools/testing/selftests/bpf/benchs/bench_tst_map.c > >> create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_tst.sh > >> create mode 100644 tools/testing/selftests/bpf/progs/tst_bench.c > >> > >> -- > >> 2.31.1 > >> > > . >