Re: [PATCH 1/8] tree-walk: report recursion counts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/30/2020 2:42 PM, Elijah Newren wrote:
> On Wed, Dec 30, 2020 at 11:26 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@xxxxxxxxx> wrote:
>>
>> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>>
>> The traverse_trees() method recusively walks through trees, but also
> 
> recursively -- you're missing the second 'r'.

Thanks.

>> prunes the tree-walk based on a callback. Some callers, such as
>> unpack_trees(), are quite complicated and can have wildly different
>> performance between two different commands.
> 
> Not sure it belongs in the commit message, but you do have me curious
> what you're digging in to...

I'm still testing whether my idea will work out. Hopefully soon. ;)
 
>> Create constants that count these values and then report the results at
>> the end of a process. These counts are cumulative across multiple "root"
>> instances of traverse_trees(), but they provide reproducible values for
>> demonstrating improvements to the pruning algorithm when possible.
>>
>> Signed-off-by: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>> ---
>>  tree-walk.c    | 33 +++++++++++++++++++++++++++++++++
>>  unpack-trees.c |  1 -
>>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> From the subject, you are changing tree-walk.  unpack-trees depends on
> tree-walk, but why is something exposed to it with this kind of
> change?  Maybe I'll see when I get to it.

Oops. I originally reported the stats at the end of unpack_trees(), but
it seems I didn't completely back out that change.

>> diff --git a/tree-walk.c b/tree-walk.c
>> index 0160294712b..2d6226d5f18 100644
>> --- a/tree-walk.c
>> +++ b/tree-walk.c
>> @@ -4,6 +4,7 @@
>>  #include "object-store.h"
>>  #include "tree.h"
>>  #include "pathspec.h"
>> +#include "json-writer.h"
>>
>>  static const char *get_mode(const char *str, unsigned int *modep)
>>  {
>> @@ -167,6 +168,25 @@ int tree_entry_gently(struct tree_desc *desc, struct name_entry *entry)
>>         return 1;
>>  }
>>
>> +static int traverse_trees_atexit_registered;
>> +static int traverse_trees_count;
>> +static int traverse_trees_cur_depth;
>> +static int traverse_trees_max_depth;
>> +
>> +static void trace2_traverse_trees_statistics_atexit(void)
>> +{
>> +       struct json_writer jw = JSON_WRITER_INIT;
>> +
>> +       jw_object_begin(&jw, 0);
>> +       jw_object_intmax(&jw, "traverse_trees_count", traverse_trees_count);
>> +       jw_object_intmax(&jw, "traverse_trees_max_depth", traverse_trees_max_depth);
>> +       jw_end(&jw);
>> +
>> +       trace2_data_json("traverse_trees", the_repository, "statistics", &jw);
>> +
>> +       jw_release(&jw);
>> +}
> 
> Yeah, I don't know the json_writer or trace2 stuff; might be nice to
> cc Josh Steadmon or someone to review this patch.  (Or perhaps he
> already reviewed internally?)

I guess I could have pointed out that this change is modeled after
a similar statistics reporting in 42e50e78 (revision.c: add trace2
stats around Bloom filter usage, 2020-04-06).

>> +
>>  void setup_traverse_info(struct traverse_info *info, const char *base)
>>  {
>>         size_t pathlen = strlen(base);
>> @@ -180,6 +200,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base)
>>         info->namelen = pathlen;
>>         if (pathlen)
>>                 info->prev = &dummy;
>> +
>> +       if (trace2_is_enabled() && !traverse_trees_atexit_registered) {
>> +               atexit(trace2_traverse_trees_statistics_atexit);
>> +               traverse_trees_atexit_registered = 1;
>> +       }
>>  }
>>
>>  char *make_traverse_path(char *path, size_t pathlen,
>> @@ -416,6 +441,12 @@ int traverse_trees(struct index_state *istate,
>>         int interesting = 1;
>>         char *traverse_path;
>>
>> +       traverse_trees_count++;
>> +       traverse_trees_cur_depth++;
>> +
>> +       if (traverse_trees_cur_depth > traverse_trees_max_depth)
>> +               traverse_trees_max_depth = traverse_trees_cur_depth;
>> +
>>         if (n >= ARRAY_SIZE(entry))
>>                 BUG("traverse_trees() called with too many trees (%d)", n);
>>
>> @@ -515,6 +546,8 @@ int traverse_trees(struct index_state *istate,
>>         free(traverse_path);
>>         info->traverse_path = NULL;
>>         strbuf_release(&base);
>> +
>> +       traverse_trees_cur_depth--;
> 
> I double-checked to see if there were any other return sites in this
> function.  There aren't, which is nice and keeps this clean.
> 
>>         return error;
>>  }
>>
>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 323280dd48b..02f484604ac 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -1559,7 +1559,6 @@ static void populate_from_existing_patterns(struct unpack_trees_options *o,
>>         free(sparse);
>>  }
>>
>> -
> 
> Did you mean to combine this cleanup with some other patch?  If not,
> could it be put into its own patch?

It should be dropped! Thanks.

>>  static int verify_absent(const struct cache_entry *,
>>                          enum unpack_trees_error_types,
>>                          struct unpack_trees_options *);
>> --
>> gitgitgadget
> 
> Seems like a good change other than a few small nits.  I don't know
> the json_writer/trace2 stuff, so you might want another reviewer, but
> it's only a few lines and seems relatively straightforward.

I should point out that an easy way to test this locally is
to run

  GIT_TRACE2_PERF=1 git read-tree -mu HEAD

to trigger a full recursive walk. The output JSON payload looks
like this:

  statistics:{"traverse_trees_count":203,"traverse_trees_max_depth":7}

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux