On Sat, Feb 26, 2022 at 4:16 AM Hou Tao <houtao1@xxxxxxxxxx> wrote: > > For now, our case is a write-once case, so only lookup is considered. > When data set is bigger than 128KB, hash table has better lookup performance as > show below: > > | lookup all elem (ms) | 4K | 16K | 64K | 128K | 256K | 512K | 1M | 2M | 4M | 7M | > | -------------------- | --- | ---- | ---- | ----- | ----- | ----- | ------ | ------ | ------- | ------- | > | hash | 3.3 | 12.7 | 47 | 90.6 | 185.9 | 382.3 | 788.5 | 1622.4 | 3296 | 6248.7 | > | tries | 2 | 10 | 45.9 | 111.6 | 274.6 | 688.9 | 1747.2 | 4394.5 | 11229.8 | 27148.8 | > | tst | 3.8 | 16.4 | 61.3 | 139.1 | 313.9 | 707.3 | 1641.3 | 3856.1 | 9002.3 | 19793.8 | Yeah. It's hard to beat hash lookup when it's hitting a good case of O(1), but what are the chances that it stays this way? Are you saying you can size up the table and populate to good % just once? If so it's probably better to replace all strings with something like a good hash. 7M elements is not a lot. A hash producing 8 or 16 bytes will have close to zero false positives. And in case of "populate the table once" the hash seed can be precomputed and adjusted, so you can guarantee zero collisions for 7M strings. While lookup part can still have 0.001% chance of a false positive there could be a way to deal with it after lookup. > Ternary search tree always has better memory usage: > > | memory usage (MB) | 4K | 16K | 64K | 128K | 256K | 512K | 1M | 2M | 4M | 7M | > | ----------------- | --- | --- | ---- | ---- | ---- | ---- | ---- | ----- | ----- | ------ | > | hash | 2.2 | 8.9 | 35.5 | 71 | 142 | 284 | 568 | 1136 | 2272 | 4302.5 | > | tries | 2.1 | 8.5 | 34 | 68 | 136 | 272 | 544 | 1088 | 2176 | 4106.9 | > | tst | 0.5 | 1.6 | 5.6 | 10.6 | 20.3 | 38.6 | 73.1 | 138.6 | 264.6 | 479.5 | > Ternary search tree looks amazing. Since you have a prototype can you wrap it into a new type of bpf map and post the patches? I wonder what data structures look like to achieve such memory efficiency.