Re: [PATCH v4 1/8] dir: convert trace calls to trace2 equivalents

Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> · Wed, 12 May 2021 08:26:46 -0400

On 5/11/21 8:44 PM, Elijah Newren wrote:
On Tue, May 11, 2021 at 4:12 PM Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:

On 5/11/21 4:12 PM, Elijah Newren wrote:
On Tue, May 11, 2021 at 12:06 PM Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:

On 5/11/21 2:34 PM, Elijah Newren via GitGitGadget wrote:
From: Elijah Newren <newren@xxxxxxxxx>

Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
---
    dir.c                             |  43 +++++--
    t/t7063-status-untracked-cache.sh | 205 ++++++++++++++++++------------
    t/t7519-status-fsmonitor.sh       |   8 +-
    3 files changed, 155 insertions(+), 101 deletions(-)

diff --git a/dir.c b/dir.c
index 3474e67e8f3c..122fcbffdf89 100644
--- a/dir.c
+++ b/dir.c
@@ -2760,15 +2760,34 @@ static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
        return root;
    }

+static void trace2_read_directory_statistics(struct dir_struct *dir,
+                                          struct repository *repo,
+                                          const char *path)
+{
+     if (!dir->untracked)
+             return;
+     trace2_data_string("read_directory", repo, "path", path);

I'm probably just nit-picking here, but should this look more like:

nit-picking and questions are totally fine.  :-)  Thanks for reviewing.


          if (path && *path)
                  trace2_data_string(...)

path is always non-NULL (it'd be an error to call read_directory()
with a NULL path).  So the first part of the check isn't meaningful
for this particular code.  The second half is interesting.  Do we want
to omit the path when it happens to be the toplevel directory (the
case where !*path)?  The original trace_performance_leave() calls
certainly didn't, and I was just trying to provide the same info they
do, as you suggested.  I guess people could determine the path by
knowing that the code doesn't print it when it's empty, but do we want
trace2 users to need to read the code to figure out statistics and
info?

that's fine.  it might be easier to just always print it (even if
blank) so that post-processors know that rather than have to assume
it.


          if (!dir->untracked)
                  return;

Then when you add the visitied fields in the next commit,
you'll have the path with them (when present).

There is always a path with them, it's just that the empty string
denotes the toplevel directory.

(and it would let you optionally avoid the tmp strbuf in
the caller.)

The path in read_directory() is not necessarily NUL-delimited, so
attempting to use it as-is, or even with your checks, would cause us
to possibly print garbage and do out-of-bounds reads.  We need the tmp
strbuf.


I just meant, "if (!len) pass NULL, else build and pass tmp.buf".

Ah, gotcha, that's why you were checking non-NULL.

However, what about the other case when len is nonzero.  Let's say
that len = 8 and path points at
"filename*%&#)aWholeBunchOfTotalGarbageAfterTheRealFilenameThatShouldNotBeReadOrIncluded\0\0\0\0\0\0\0\0\0\0"
?

How do you make it print "filename" and only "filename" without the
other stuff without using the tmp strbuf?


I was still saying to use the "strbuf tmp" in the non-zero len case,
but just pass NULL (or "") for the len==0 case.

Alternatively, since `trace2_read_directory_statistics() a static
local function, we could move all of the path manipulation into it.

static void emit_stats(
	struct dir_struct *dir,
	struct repository *repo,
	const char* path_buf,
	size_t path_len)
{
	if (!path_len)
		trace2_data_string("read_directory", repo,
			"path", "");
	else {
		struct strbuf tmp = STRBUF_INIT;
		strbuf_add(&tmp, path_buf, path_len);
		trace2_data_string("read_directory", repo,
			"path", tmp.buf);
		strbuf_release(&tmp);
	}
	... the rest of intmax stats ...
}


BTW, could we also rename your stats function?  I've been trying
to keep the "trace2_" prefix reserved for the Trace2 API.


Thanks,
Jeff