On 12/30/2020 2:42 PM, Elijah Newren wrote: > On Wed, Dec 30, 2020 at 11:26 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: >> >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> >> The traverse_trees() method recusively walks through trees, but also > > recursively -- you're missing the second 'r'. Thanks. >> prunes the tree-walk based on a callback. Some callers, such as >> unpack_trees(), are quite complicated and can have wildly different >> performance between two different commands. > > Not sure it belongs in the commit message, but you do have me curious > what you're digging in to... I'm still testing whether my idea will work out. Hopefully soon. ;) >> Create constants that count these values and then report the results at >> the end of a process. These counts are cumulative across multiple "root" >> instances of traverse_trees(), but they provide reproducible values for >> demonstrating improvements to the pruning algorithm when possible. >> >> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> --- >> tree-walk.c | 33 +++++++++++++++++++++++++++++++++ >> unpack-trees.c | 1 - >> 2 files changed, 33 insertions(+), 1 deletion(-) > > From the subject, you are changing tree-walk. unpack-trees depends on > tree-walk, but why is something exposed to it with this kind of > change? Maybe I'll see when I get to it. Oops. I originally reported the stats at the end of unpack_trees(), but it seems I didn't completely back out that change. >> diff --git a/tree-walk.c b/tree-walk.c >> index 0160294712b..2d6226d5f18 100644 >> --- a/tree-walk.c >> +++ b/tree-walk.c >> @@ -4,6 +4,7 @@ >> #include "object-store.h" >> #include "tree.h" >> #include "pathspec.h" >> +#include "json-writer.h" >> >> static const char *get_mode(const char *str, unsigned int *modep) >> { >> @@ -167,6 +168,25 @@ int tree_entry_gently(struct tree_desc *desc, struct name_entry *entry) >> return 1; >> } >> >> +static int traverse_trees_atexit_registered; >> +static int traverse_trees_count; >> +static int traverse_trees_cur_depth; >> +static int traverse_trees_max_depth; >> + >> +static void trace2_traverse_trees_statistics_atexit(void) >> +{ >> + struct json_writer jw = JSON_WRITER_INIT; >> + >> + jw_object_begin(&jw, 0); >> + jw_object_intmax(&jw, "traverse_trees_count", traverse_trees_count); >> + jw_object_intmax(&jw, "traverse_trees_max_depth", traverse_trees_max_depth); >> + jw_end(&jw); >> + >> + trace2_data_json("traverse_trees", the_repository, "statistics", &jw); >> + >> + jw_release(&jw); >> +} > > Yeah, I don't know the json_writer or trace2 stuff; might be nice to > cc Josh Steadmon or someone to review this patch. (Or perhaps he > already reviewed internally?) I guess I could have pointed out that this change is modeled after a similar statistics reporting in 42e50e78 (revision.c: add trace2 stats around Bloom filter usage, 2020-04-06). >> + >> void setup_traverse_info(struct traverse_info *info, const char *base) >> { >> size_t pathlen = strlen(base); >> @@ -180,6 +200,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base) >> info->namelen = pathlen; >> if (pathlen) >> info->prev = &dummy; >> + >> + if (trace2_is_enabled() && !traverse_trees_atexit_registered) { >> + atexit(trace2_traverse_trees_statistics_atexit); >> + traverse_trees_atexit_registered = 1; >> + } >> } >> >> char *make_traverse_path(char *path, size_t pathlen, >> @@ -416,6 +441,12 @@ int traverse_trees(struct index_state *istate, >> int interesting = 1; >> char *traverse_path; >> >> + traverse_trees_count++; >> + traverse_trees_cur_depth++; >> + >> + if (traverse_trees_cur_depth > traverse_trees_max_depth) >> + traverse_trees_max_depth = traverse_trees_cur_depth; >> + >> if (n >= ARRAY_SIZE(entry)) >> BUG("traverse_trees() called with too many trees (%d)", n); >> >> @@ -515,6 +546,8 @@ int traverse_trees(struct index_state *istate, >> free(traverse_path); >> info->traverse_path = NULL; >> strbuf_release(&base); >> + >> + traverse_trees_cur_depth--; > > I double-checked to see if there were any other return sites in this > function. There aren't, which is nice and keeps this clean. > >> return error; >> } >> >> diff --git a/unpack-trees.c b/unpack-trees.c >> index 323280dd48b..02f484604ac 100644 >> --- a/unpack-trees.c >> +++ b/unpack-trees.c >> @@ -1559,7 +1559,6 @@ static void populate_from_existing_patterns(struct unpack_trees_options *o, >> free(sparse); >> } >> >> - > > Did you mean to combine this cleanup with some other patch? If not, > could it be put into its own patch? It should be dropped! Thanks. >> static int verify_absent(const struct cache_entry *, >> enum unpack_trees_error_types, >> struct unpack_trees_options *); >> -- >> gitgitgadget > > Seems like a good change other than a few small nits. I don't know > the json_writer/trace2 stuff, so you might want another reviewer, but > it's only a few lines and seems relatively straightforward. I should point out that an easy way to test this locally is to run GIT_TRACE2_PERF=1 git read-tree -mu HEAD to trigger a full recursive walk. The output JSON payload looks like this: statistics:{"traverse_trees_count":203,"traverse_trees_max_depth":7} Thanks, -Stolee