On 12/3/2020 5:43 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> ... >> - case GRAPH_CHUNKID_DATA: >> - if (graph->chunk_commit_data) >> - chunk_repeated = 1; >> - else >> - graph->chunk_commit_data = data + chunk_offset; >> - break; >> + /* limit the chunk-format list if we are ignoring Bloom filters */ >> + if (!r->settings.commit_graph_read_changed_paths) >> + chunks_nr = 5; > > I presume that this magic number 5 is directly connected to ... > >> +static struct read_chunk_info read_chunks[] = { >> + [0] = { >> + GRAPH_CHUNKID_OIDFANOUT, >> + graph_read_oid_fanout >> + }, >> + [1] = { >> + GRAPH_CHUNKID_OIDLOOKUP, >> + graph_read_oid_lookup >> + }, >> + [2] = { >> + GRAPH_CHUNKID_DATA, >> + graph_read_data >> + }, >> + [3] = { >> + GRAPH_CHUNKID_EXTRAEDGES, >> + graph_read_extra_edges >> + }, >> + [4] = { >> + GRAPH_CHUNKID_BASE, >> + graph_read_base_graphs >> + }, >> + [5] = { >> + GRAPH_CHUNKID_BLOOMINDEXES, >> + graph_read_bloom_indices >> + }, >> + [6] = { >> + GRAPH_CHUNKID_BLOOMDATA, >> + graph_read_bloom_data >> + } >> +}; > > ... the index of these entries in the table. > > I find this arrangement brittle. For one thing, it means that new > chunks cannot be added at an arbitrary place, and BLOOMDATA must be > at the end for the "limit the list when ignoring" logic to work > correctly (even when the developer who is adding a new chunk type > realizes that new one must be inserted before [5], and the magic > number 5 above must be updated), and it won't work at all if a > similar "we can optionally ignore reading the data" treatment needs > to be made for new chunk types, since two or more things cannot be > at the end of the list at the same time X-<. > > Would it be a cleaner way to implement this "when member X in the > settings structure is not set, ignore BLOOMINDEXES and BLOOMDATA" > feature to let these graph_read_*() functions have access to the > settings structure, so that bloom_indices and bloom_data callback > functions can make decisions for themselves? > > Once we do that, as far as I can tell, there is no longer any reason > to initialize read_chunks[] array with designated initializer. The > implementation of read_table_of_contents() does not tolerate if this > array is sparsely populated anyway, and except for the "do we ignore > bloom?" hack, nothing really depends on the order of entries in the > array, right? This is a good point, and it's my fault for not noticing this behavior. In an earlier version, I was initializing read_chunks dynamically inside parse_commit_graph(), where the change made more sense (the Bloom chunks were not initialized at all based on this condition). I think the better way to handle this is to check the config within the reading functions graph_read_bloom_(indices|data). The added benefit of checking in the read_chunk_fn is that the chunk-format API can see that these chunks are "known" chunk types, in case we were to add logging for "I don't understand this chunk type". Thanks, -Stolee