On 2022.06.23 14:59, Junio C Hamano wrote: > Josh Steadmon <steadmon@xxxxxxxxxx> writes: > > > This series of changes broke fuzz-commit-graph, which attempts to parse > > arbitrary fuzzing-engine-provided bytes as a commit graph file. > > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but > > since we run the fuzz tests without a valid repository, we are hitting > > the BUG() from 44c7e62 for every test case. > > > > Fix this by moving the majority of the implementaiton of > > `parse_commit_graph()` into a new function, > > `parse_commit_graph_settings()` that accepts a repo_settings pointer. > > This allows fuzz-commit-graph to continue to test the commit-graph > > parser implementation without relying on prepare_repo_settings(). > > It sounds like this is not a "fix" but a workaround to bend the > production code so that a non-production test shim can be inserted > more easily. > > I am OK with the idea, but have a huge problem with the new name. > > Is it just me who thinks parse_commit_graph_settings() is a function > that parses some kind of settings that affects the way the commit > graph gets used or written? > > Stepping back a bit, why can't fuzz-commit-graph prepare a > repository object that looks sufficiently real? Something along the > lines of... > > struct repository fake_repo; > > fake_repo.settings.initialized = 1; > fake_repo.gitdir = "."; > parse_commit_graph(&fake_repo, (void *)data, size); > ... > > Also, I feel somewhat uneasy to see these changes: > > > - if (get_configured_generation_version(r) >= 2) { > > + if (s->commit_graph_generation_version >= 2) { > > - if (r->settings.commit_graph_read_changed_paths) { > > + if (s->commit_graph_read_changed_paths) { > > - ctx->write_generation_data = (get_configured_generation_version(r) == 2); > > + ctx->write_generation_data = (r->settings.commit_graph_generation_version == 2); > > ctx->num_generation_data_overflows = 0; > > that makes the production code bend over backwards to _avoid_ > referencing 'r', only to cater to the test shim. That's an > artificial limitation we are forcing on our developers who works on > this code. It might be that what is in the repository settings is > sufficient for today's code to work, but I do not think needs for > fuzz tests should tie the hand of this production code by forbidding > it to look at other things in the repository in the future. After > all, tests are to serve the production code, not the other way > around. > > On the other hand, I think a change that is slightly smaller than > the posted patch, which justifies itself with a completely different > rationale, would be totally acceptable. You can justify this change > with NO mention of fuzzers. > > The parse_commit_graph() function takes a "struct repository *" > pointer, but all it cares about is the .settings member of it. > > Update the function and all its existing callers so that it > takes "struct repo_settings *" instead. > > Now, in the future, some developers _might_ find it necessary to > access stuff other than the repository settings to validate the > contents of the graph file, and we may need to change it to take a > full repository structure again. The test should adjust to such > needs of the production code, not the other way around. But until > then... > > Thanks. I trimmed down the changes and reworded the commit message for V4. I'm assuming you don't object to mentioning the fuzzer, just that it shouldn't be the main justifiation for the change. Sorry for the delayed response, and thanks for the feedback.