On 2022.03.23 15:52, Taylor Blau wrote: > On Wed, Mar 23, 2022 at 03:22:13PM -0400, Derrick Stolee wrote: > > On 3/23/2022 2:03 PM, Josh Steadmon wrote: > > > prepare_repo_settings() initializes a `struct repository` with various > > > default config options and settings read from a repository-local config > > > file. In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only > > > in git repos), prepare_repo_settings was changed to issue a BUG() if it > > > is called by a process whose CWD is not a Git repository. This approach > > > was suggested in [1]. > > > > > > This breaks fuzz-commit-graph, which attempts to parse arbitrary > > > fuzzing-engine-provided bytes as a commit graph file. > > > commit-graph.c:parse_commit_graph() calls prepare_repo_settings(), but > > > since we run the fuzz tests without a valid repository, we are hitting > > > the BUG() from 44c7e62 for every test case. > > > > > > Fix this by refactoring prepare_repo_settings() such that it sets > > > default options unconditionally; if its process is in a Git repository, > > > it will also load settings from the local config. This eliminates the > > > need for a BUG() when not in a repository. > > > > I think you have the right idea and this can work. > > Hmmm. To me this feels like bending over backwards in > `prepare_repo_settings()` to accommodate one particular caller. I'm not > necessarily opposed to it, but it does feel strange to make > `prepare_repo_settings()` a noop here, since I would expect that any > callers who do want to call `prepare_repo_settings()` are likely > convinced that they are inside of a repository, and it probably should > be a BUG() if they aren't. > > I was initially thinking that Josh's alternative of introducing and > calling a lower-level version of `prepare_commit_graph()` that doesn't > require the use of any repository settings would make sense. > > But when I started looking at implementing it, I became confused at how > this is supposed to work at all without using a repository. We depend on > the values parsed from: > > - commitGraph.generationVersion > - commitGraph.readChangedPaths > > to determine which part(s) of the commit graph we do and don't read. > > Looking around, I think I probably inadvertently broke this in > ab14d0676c (commit-graph: pass a 'struct repository *' in more places, > 2020-09-09). But prior to ab14d0676c, neither of those settings existed, > so parsing the commit graph was a pure function of the commit graph's > contents alone, and didn't rely on the existence of a repository. Yeah, I have not done a great job keeping the fuzzers up to date with commit-graph changes :(. > We could pretend as if `commitGraph.generationVersion` is always "2" and > `commitGraph.readChangedPaths` is always "true", and I think that would > still give us good-enough coverage. It might also be worthwhile for the fuzzer to test each interesting combination of settings, using the same arbitrary input. > I assume that libFuzzer doesn't give us a convenient way to stub out a > repository. Maybe we should have a variant of `parse_commit_graph` that > instead takes a `struct repo_settings` filled out with the defaults? > > We could use that to teach libFuzzer about additional states that the > commit graph parser can be in, too (though probably outside the scope of > this patch). > > I tried to sketch all of this out, which seems to work. Let me know what > you think: Yes, your patch looks like a much smaller change than I feared would be the case for a parse_commit_graph refactor. I'll test it out, and probably add some additional fuzzer improvements on top. Thanks for the patch!