Re: [PATCH v4 07/20] test-read-cache: print cache entries with --table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 24 2021, Derrick Stolee wrote:

> On 3/23/21 9:24 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Tue, Mar 23 2021, Derrick Stolee via GitGitGadget wrote:
>> 
>>> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx>
>>>
>>> This table is helpful for discovering data in the index to ensure it is
>>> being written correctly, especially as we build and test the
>>> sparse-index. This table includes an output format similar to 'git
>>> ls-tree', but should not be compared to that directly. The biggest
>>> reasons are that 'git ls-tree' includes a tree entry for every
>>> subdirectory, even those that would not appear as a sparse directory in
>>> a sparse-index. Further, 'git ls-tree' does not use a trailing directory
>>> separator for its tree rows.
>>>
>>> This does not print the stat() information for the blobs. That will be
>>> added in a future change with another option. The tests that are added
>>> in the next few changes care only about the object types and IDs.
>>> However, this future need for full index information justifies the need
>>> for this test helper over extending a user-facing feature, such as 'git
>>> ls-files'.
>> 
>> Is that stat() information that's going to be essential to grab in the
>> same process that runs the "for (i = 0; i < istate->cache_nr; i++)"
>> for-loop, or stat() information that could be grabbed as:
>> 
>>     git ls-files -z --stage | some-program-that-stats-all-listed-blobs
>
> The point is not to find the stat() data from disk, but to ensure that
> the stat() data is correctly stored in the index (say, after converting
> an existing index from another format). This pipe strategy does not
> allow for that scenario.

So a dump of ce->ce_stat_data, i.e. the same thing ls-files --debug
prints out now, or...?

>> It's not so much that I still disagree as I feel like I'm missing
>> something. I haven't gone through this topic with a fine toothed comb,
>> so ...
>> 
>> If and when these patches land and I'm using this nascent sparse
>> checkout support why wouldn't I want ls-files or another not-a-test-tool
>> to support extracting this new information that's in the index?
>> 
>> That's why I sent the RFC patches at
>> https://lore.kernel.org/git/20210317132814.30175-2-avarab@xxxxxxxxx/ to
>> roll this functionality into ls-files.
>
> And I recommend that you continue to pursue them as an independent
> series, but I'm not going to incorporate them into this one. I'm
> not going to distract from this internal data structure with changes
> to user-facing commands until I think it's ready to use. As the design
> document describes the plan, I don't expect this to be something I
> will recommend to users until most of "Phase 3" is complete, making
> the most common Git commands aware of a sparse index. (I expect to
> fast-track a prototype to willing users that covers that functionality
> while review continues on the mailing list.)

This series is 20 patches. Your current derrickstolee/sparse-index/wip
is another 36, and from skimming those patches & your design doc those
56 seem to be partway into Phase I of IV.

So at the rate things tend to get reviewed / re-rolled & land in git.git
it seems exceedingly likely that we'll have some part-way implementation
of this for at least a major release or two. No?

Which is why I'm suggesting/asking if we shouldn't have something like
this debugging helper as part of installed tooling, because people are
going to try it, it's probably going to have bugs and do other weird
things, and I'd rather not have to manually build some test-tool to
debug some local sparse checkout somewhere.

> Making a change to a builtin is _forever_, and since the only
> purpose right now is to expose the data in a test environment, I
> don't want to adjust the builtin until either there is a real user
> need or the feature has otherwise stabilized. If you want to take on
> that responsibility, then please do.

That's just not the case, we have plenty of unstable debug-esque options
in various built-in commands, in fact ls-files already has a --debug
option whose docs say:

    This is intended to show as much information as possible for manual
    inspection; the exact format may change at any time.

It was added in 84974217151 (ls-files: learn a debugging dump format,
2010-07-31) and "just tacks all available data from the cache onto each
file's line" so in a way not adjusting it and using it would be a
regression, after all this is new data in the cache, so it should print
it :)

There's also PARSE_OPT_HIDDEN for other such in-tree use. Whatever the
sanity/merits of me suggesting that this specific thing be in ls-files
instead of a test-helper, it seems far fetched that something like that
hidden behind a GIT_TEST_* env var (or hidden option, --debug etc.) is
something we'd need to worry about backwards compatibility for.

So, whatever you think about the merits of including this functionality
in ls-files I think your stance of this being a no-go for adding to the
builtin is based on a false premise. It's fine to have
unstable/transitory/debug output in the builtins. We just name &
document them as such.

I also had some feedback in that series and on the earlier iteration
that I think is appropriate to be incorporated into a re-roll of this
one, which doesn't have anything to do with the question of whether we
use ls-files or the helper in the tests. Such as us showing more stuff
into the read-cache.c test-tool v.s. splitting it up making that code
needlessly convoluted.

I don't see how recommending that I pursue that as an independent series
is productive for anyone. So as you re-roll this I should submit another
series on top to refactor your in-flight code & tests?

Either my suggestions are just bad, and we shouldn't do them at all, or
it makes sense to incorporate relevant feedback in re-rolls. I'll let
other reviewers draw their own conclusions on that.

That's not a snarky "I'm right" b.t.w., I may honestly be full of it on
this particular topic.

But if those suggested changes are worth doing at all, then doing them
in that way seems like a massive waste of time for everyone involved, or
maybe I'm not getting what you're suggesting by pursuing them as an
independent series.

> Otherwise, I will need to eventually handle "git ls-files" being
> sparse-aware when eventually removing 'command_requires_full_index',
> (Phase 4) so that would be a good opportunity to adjust the
> expectations.

At which point you'd be adjusting your tests that expect ls-tree format
output to using ls-files output, instead of using ls-files-like output
from the beginning?

At the end of this E-Mail is a patch on top that adds an undocumented
--debug-sparse in addition to the existing --debug. Running that in the
middle of one of your tests:
    
    $ ~/g/git/git ls-files --debug -- a folder1
    a
      ctime: 1616641434:474004002
      mtime: 1616641434:474004002
      dev: 2306     ino: 28576528
      uid: 1001     gid: 1001
      size: 8       flags: 0
    folder1/a
      ctime: 0:0
      mtime: 0:0
      dev: 0        ino: 0
      uid: 0        gid: 0
      size: 0       flags: 40000000
    $ ~/g/git/git ls-files --debug --debug-sparse -- a folder1
    a 
      ctime: 1616641434:474004002
      mtime: 1616641434:474004002
      dev: 2306     ino: 28576528
      uid: 1001     gid: 1001
      size: 8       flags: 0
    folder1/
      ctime: 0:0
      mtime: 0:0
      dev: 0        ino: 0
      uid: 0        gid: 0
      size: 0       flags: 40004000
    $ ~/g/git/git ls-files --stage -- a folder1
    100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0       a
    100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0       folder1/a
    $ ~/g/git/git ls-files --stage --debug-sparse -- a folder1
    100644 e79c5e8f964493290a409888d5413a737e8e5dd5 0       a
    040000 f203181537ff55dcf7896bf8c5b5c35af1514421 0       folder1/

I.e. it gives you everything your helper does and more with a trivial
addition of a --debug-sparse (which we can later just remove, it's a
debug option...).

See e.g. my recent 15c9649730d (grep/log: remove hidden --debug and
--grep-debug options, 2021-01-26) which is already in a release, and
AFAICT nobody has noticed or cared.

I don't know if that's the stat() information you wanted (your WIP
branch doesn't have such a change), but presumably it either is the info
you want, or ls-files's --debug would want to emit any such such info
that's now missing too.

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 13bcc2d8473..e691512d4f8 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -34,6 +34,7 @@ static int show_valid_bit;
 static int show_fsmonitor_bit;
 static int line_terminator = '\n';
 static int debug_mode;
+static int debug_sparse_mode;
 static int show_eol;
 static int recurse_submodules;
 static int skipping_duplicates;
@@ -242,9 +243,17 @@ static void show_ce(struct repository *repo, struct dir_struct *dir,
 		if (!show_stage) {
 			fputs(tag, stdout);
 		} else {
+			unsigned int mode = ce->ce_mode;
+			if (debug_sparse_mode && S_ISSPARSEDIR(mode))
+				/*
+				 * We could just do & 0177777 all the
+				 * time, just make it clear this is
+				 * for --debug-sparse.
+				 */
+				mode &= 0177777;
 			printf("%s%06o %s %d\t",
 			       tag,
-			       ce->ce_mode,
+			       mode,
 			       find_unique_abbrev(&ce->oid, abbrev),
 			       ce_stage(ce));
 		}
@@ -667,6 +676,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 			N_("pretend that paths removed since <tree-ish> are still present")),
 		OPT__ABBREV(&abbrev),
 		OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
+		OPT_BOOL(0, "debug-sparse", &debug_sparse_mode, N_("show sparse debugging data")),
 		OPT_BOOL(0, "deduplicate", &skipping_duplicates,
 			 N_("suppress duplicate entries")),
 		OPT_END()
@@ -681,9 +691,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		prefix_len = strlen(prefix);
 	git_config(git_default_config, NULL);
 
-	if (repo_read_index(the_repository) < 0)
-		die("index file corrupt");
-
 	argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
 			ls_files_usage, 0);
 	pl = add_pattern_list(&dir, EXC_CMDL, "--exclude option");
@@ -700,6 +707,10 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		tag_skip_worktree = "S ";
 		tag_resolve_undo = "U ";
 	}
+	if (debug_sparse_mode) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
 	if (show_modified || show_others || show_deleted || (dir.flags & DIR_SHOW_IGNORED) || show_killed)
 		require_work_tree = 1;
 	if (show_unmerged)
@@ -743,6 +754,12 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		max_prefix = common_prefix(&pathspec);
 	max_prefix_len = get_common_prefix_len(max_prefix);
 
+	/*
+	 * Read the index after parse options etc. have had a chance
+	 * to die early.
+	 */
+	if (repo_read_index(the_repository) < 0)
+		die("index file corrupt");
 	prune_index(the_repository->index, max_prefix, max_prefix_len);
 
 	/* Treat unmatching pathspec elements as errors */




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux