On 6/13/2023 1:39 PM, Jonathan Tan wrote: > commitGraph.changedPathsVersion:: > Specifies the version of the changed-path Bloom filters that Git will read and > - write. May be 0 or 1. Any changed-path Bloom filters on disk that do not > + write. May be 0, 1, or 2. Any changed-path Bloom filters on disk that do not > match the version set in this config variable will be ignored. > + > Defaults to 1. Is this a good place to document the planned modification of this default in future versions of Git? > +static uint32_t murmur3_seeded_v1(uint32_t seed, const char *data, size_t len) > { > const uint32_t c1 = 0xcc9e2d51; > const uint32_t c2 = 0x1b873593; > @@ -130,8 +187,10 @@ void fill_bloom_key(const char *data, > int i; > const uint32_t seed0 = 0x293ae76f; > const uint32_t seed1 = 0x7e646e2c; > - const uint32_t hash0 = murmur3_seeded(seed0, data, len); > - const uint32_t hash1 = murmur3_seeded(seed1, data, len); > + const uint32_t hash0 = (settings->hash_version == 2 > + ? murmur3_seeded_v2 : murmur3_seeded_v1)(seed0, data, len); > + const uint32_t hash1 = (settings->hash_version == 2 > + ? murmur3_seeded_v2 : murmur3_seeded_v1)(seed1, data, len); This is the critical step. I've not seen this ternary trick within a function name choice, but it makes for a compact check. Nice. However, I think the 'settings->hash_version' is the wrong place to look for the condition. We should be getting this value from the commit-graph we are reading. (More on this later.) > struct bloom_filter_settings { > /* > * The version of the hashing technique being used. > - * We currently only support version = 1 which is > + * The newest version is 2, which is > * the seeded murmur3 hashing technique implemented > - * in bloom.c. > + * in bloom.c. Bloom filters of version 1 were created > + * with prior versions of Git, which had a bug in the > + * implementation of the hash function. ... > +struct graph_read_bloom_data_data { > + struct commit_graph *g; > + int commit_graph_changed_paths_version; > +}; > + > static int graph_read_bloom_data(const unsigned char *chunk_start, > size_t chunk_size, void *data) > { > - struct commit_graph *g = data; > + struct graph_read_bloom_data_data *d = data; > + struct commit_graph *g = d->g; > uint32_t hash_version; > g->chunk_bloom_data = chunk_start; > hash_version = get_be32(chunk_start); > > - if (hash_version != 1) > + if (hash_version != d->commit_graph_changed_paths_version) > return 0; This makes it appear like we cannot read a commit-graph that has a Bloom filter version that doesn't match the configured version. This seems incorrect. If we want to configure to _write_ v2, we should still be able to _read_ v1 concurrently until those v2 filters are written. This check should be: if (hash_version <= 0 || hash_version > 2) return 0; and then g->filter_hash_version = hash_version; to store this hash version somewhere in the graph. This way, we can read any commit-graph file and will not suffer performance problems in the time between setting the config value and writing the new commit-graph file. > - if (s->commit_graph_changed_paths_version == 1) { > + if (s->commit_graph_changed_paths_version == 1 > + || s->commit_graph_changed_paths_version == 2) { Perhaps this could just be if (s->commit_graph_changed_paths_version) { to say "not zero means we still read the filters". Though, since this config _should_ mean "which version do we _write_?" it might be good to go back on the "unifying the two config options". > + struct graph_read_bloom_data_data data = { > + .g = graph, > + .commit_graph_changed_paths_version = s->commit_graph_changed_paths_version > + }; > pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES, > &graph->chunk_bloom_indexes); > read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA, > - graph_read_bloom_data, graph); > + graph_read_bloom_data, &data); > } Much of this block will not be necessary as we don't need to send the repo settings into the read_chunk() method. > + if (r->settings.commit_graph_changed_paths_version < 0 > + || r->settings.commit_graph_changed_paths_version > 2) { > + warning(_("attempting to write a commit-graph, but 'commitgraph.changedPathsVersion' (%d) is not supported"), > + r->settings.commit_graph_changed_paths_version); > + return 0; I see the "< 0" means we aren't considering the case of disabling writes with the zero value. We should exit early if we see a zero here, for extra safety. > + } > + bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2 > + ? 2 : 1; Once we've checked that this value is not zero, we can do this: bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version; > /* We have changed-paths already. Keep them in the next graph */ > - if (g && g->chunk_bloom_data) { > + if (g && g->bloom_filter_settings) { > ctx->changed_paths = 1; > ctx->bloom_settings = g->bloom_filter_settings; > } > --- a/t/t4216-log-bloom.sh > +++ b/t/t4216-log-bloom.sh > @@ -450,4 +450,35 @@ test_expect_success 'version 1 changed-path used when version 1 requested' ' > test_bloom_filters_used "-- $CENT") > ' > > +test_expect_success 'version 1 changed-path not used when version 2 requested' ' > + (cd highbit1 && > + git config --add commitgraph.changedPathsVersion 2 && > + test_bloom_filters_not_used "-- $CENT") > +' I think this test is the wrong behavior. We should be able to use version 1 when version 2 is requested. Instead, start with version 1, and _upgrade_ to version 2 and then check which version exists in the file. We should only _not_ use the filters if the version is 0 (or commitGraph.readChangedPaths=false). I think this might be a good enough reason to > + > +test_expect_success 'set up repo with high bit path, version 2 changed-path' ' > + git init highbit2 && > + git -C highbit2 config --add commitgraph.changedPathsVersion 2 && > + test_commit -C highbit2 c2 "$CENT" && > + git -C highbit2 commit-graph write --reachable --changed-paths > +' > + > +test_expect_success 'check value of version 2 changed-path' ' > + (cd highbit2 && > + printf "c01f" >expect && > + get_first_changed_path_filter >actual && > + test_cmp expect actual) > +' > + > +test_expect_success 'version 2 changed-path used when version 2 requested' ' > + (cd highbit2 && > + test_bloom_filters_used "-- $CENT") > +' > + > +test_expect_success 'version 2 changed-path not used when version 1 requested' ' > + (cd highbit2 && > + git config --add commitgraph.changedPathsVersion 1 && > + test_bloom_filters_not_used "-- $CENT") > +' Again, this is also the wrong situation. Thanks, -Stolee