The memory improvement is minor, but any memory reduction at all is welcome at this point. Fortunately, this set of changes is unintrusive. I have some other ideas that I'll hopefully get to implement before swapping kills all my SSDs (see bottom). Eric Wong (3): list-objects-filter: use kh_size API treewide: switch to khashl for memory savings khashl: fix ensemble lookups on empty table builtin/fast-import.c | 2 +- builtin/fsmonitor--daemon.c | 4 +- delta-islands.c | 4 +- khash.h | 338 ----------------------- khashl.h | 522 ++++++++++++++++++++++++++++++++++++ list-objects-filter.c | 2 +- object-store-ll.h | 2 +- object-store.h | 7 +- oidset.h | 2 +- pack-bitmap.h | 2 +- 10 files changed, 535 insertions(+), 350 deletions(-) delete mode 100644 khash.h create mode 100644 khashl.h TODO (some ideas stolen from khashl): * obj_hash can probably use a 0.75 load factor (instead of 0.5) to save memory and not slow down too much. oid_* khash has always had 0.77 which was fine and now khashl has 0.75. 0.75 is cheaper to compute via (shifts + ORs) than 0.77. * obj_hash can use `i = (i + 1) & mask' like khashl does to avoid branches in linear probe loops * obj_hash can use realloc and copy in-place resize logic khashl does. khashl uses the ->used bitmap for this, but it should be possible to do for obj_hash w/o additional allocations via pointer tagging (not khashl inspired): * keep an identity pool of OIDs separately (similar to how Perl5 pools all hash keys) and only use tagged pointers for OIDs. Pointer tagging can be used to distinguish between 7 hash functions, leaving us room for 5 more in addition to SHA-(1|256). This change will be a large effort (with a hopefully large savings). If we ever need more than 7 hash functions, we can switch to storing hash type information in slabs that can be looked up using address ranges (AFAIK, jemalloc does this).