Junio, thank you for your patience to review my patch and guide me how to modify it. Junio C Hamano <gitster@xxxxxxxxx> 于2021年1月15日周五 上午8:59写道: > > "阿德烈 via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > From: ZheNing Hu <adlternative@xxxxxxxxx> > > > > In order to provide users a better experience > > when viewing information about files in the index > > and the working tree, the `--dedup` option will suppress > > some duplicate options under some conditions. > > > > In a merge conflict, one item of "git ls-files" output may > > appear multiple times. For example,now the file `a.c` has > > a conflict,`a.c` will appear three times in the output of > > "git ls-files".We can use "git ls-files --dedup" to output > > `a.c` only one time.(unless `--stage` or `--unmerged` is > > used to view all the detailed information in the index) > > Unlike these option names we see in the description, "dedup" is not > a full word. Perhaps spell it fully "--deduplicate" while letting > parse-options machinery to accept unique prefix (including > "--dedup"? > Ok i have modified "--dedup" to "--deduplicate". > > In addition, if you use both `--delete` and `--modify` in > > the same time, The `--dedup` option can also suppress modified > > "at the same time", I think. > My poor English grammar :-) > > entries output. > > [let's call this point "point A"] > > > `--dedup` option relevant descriptions in > > `Documentation/git-ls-files.txt`, > > I am not sure what this means. > > > the test script in `t/t3012-ls-files-dedup.sh` > > prove the correctness of the `--dedup` option. > > No amount of tests "proves" any correctness, but that is OK. I > think you meant to say "a few tests have been added to t3012 to > protect the new feature from future breakage" or something like > that. > Alright, I understand! > In any case, I think everything after "point A" and before your sign > off does not belong to the log message. The diffstat shows that > documentation and tests have been added already. > > > +--dedup:: > > + Suppress duplicate entries when conflict happen > > "conflict happen" -> "there are unmerged paths", as the term > "unmerged" is already shown to readers of "ls-files --help". > Well, maybe I'm not good enough with these proper nouns. > > + or `--deleted` and `--modified` are combined. > > I somehow thought that you refrained from deduping when you are > showing the stages with "ls-files -u" and "ls-files -s", or you are > showing status with "ls-files -t", because you will otherwise lose > information. In other words, showing only one cache entry out of > many that share the same name makes sense only when we are showing > name and nothing else. > You are right, "--deduplicate" should only work on duplicate file names, so "ls-files -t" also needs to be corrected. Well,This is true a bug I haven't notice. > Having said all that, I suspect that we may be much better off if we > can somehow merge the two loops into one. You may be dedup adjacent > entries in each loop separately with the approach taken by this > patch, but I do not think the patch would work to deduplicate across > two loops. For example, what happens if you do this? > > $ git reset --hard > $ echo >>builtin/ls-files.c > $ git ls-files -c -m builtin/ls-files.c > $ git ls-files -t -c -m builtin/ls-files.c > > I think you see the path twice in the output, with or without your > --dedup option (remember what I said about proving, by the way? ;-)). > Yeah,This is because I may have missed the -c option with other options at the same time. Here I may disagree with your point of view: if (errno != E_NOENT) error_errno("cannot lstat '%s'", fullname.buf); With this sentence included, the patch will fail the test: t/t3010-ls-files-killed-modified.sh. the errno maybe ENOTDIR when you try to lstat a file`r` with `lstat("r/f",&st);` So I temporarily removed the judgment of errno. > #2: consolidate two for loops into one. > > The two loops have slightly different condition to skip a ce, > and different logic on what tag each path is shown with. When > --cached and --modified or --deleted are asked for at the same > time, we'd show them multiple times (this is done inside the > loop for each ce) > > if (show_cached || show_stage) > show_ce(... ce_stage(ce) ? tag_unmerged : ...); > err = lstat(fullname.buf, &st); > if (err) { > /* deleted? */ > ... code that corresponds to the > ... illustration in #1 above come here. > } else if (...) > show_ce(..., tag_modified); > > This changes the semantics. The original iterates the index > twice, so you may see the same entry from --cached once and > then again from --modified. The updated one still will show > the same entry twice but next to each other. > Well,This does change the semantics. I think people who used two for loops before may want to separate different outputs. Now, if you don’t use "--deduplicate", You may see six consecutive items under a combination of multiple options. > #3: optionally deduplicate. > > Once we have a single loop, deduplicationg based on names is > trivial, as we seen before. > > Indeed so. > Hmm? THANKS. Junio C Hamano <gitster@xxxxxxxxx> 于2021年1月15日周五 上午8:59写道: > > "阿德烈 via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > From: ZheNing Hu <adlternative@xxxxxxxxx> > > > > In order to provide users a better experience > > when viewing information about files in the index > > and the working tree, the `--dedup` option will suppress > > some duplicate options under some conditions. > > > > In a merge conflict, one item of "git ls-files" output may > > appear multiple times. For example,now the file `a.c` has > > a conflict,`a.c` will appear three times in the output of > > "git ls-files".We can use "git ls-files --dedup" to output > > `a.c` only one time.(unless `--stage` or `--unmerged` is > > used to view all the detailed information in the index) > > Unlike these option names we see in the description, "dedup" is not > a full word. Perhaps spell it fully "--deduplicate" while letting > parse-options machinery to accept unique prefix (including > "--dedup"? > > > In addition, if you use both `--delete` and `--modify` in > > the same time, The `--dedup` option can also suppress modified > > "at the same time", I think. > > > entries output. > > [let's call this point "point A"] > > > `--dedup` option relevant descriptions in > > `Documentation/git-ls-files.txt`, > > I am not sure what this means. > > > the test script in `t/t3012-ls-files-dedup.sh` > > prove the correctness of the `--dedup` option. > > No amount of tests "proves" any correctness, but that is OK. I > think you meant to say "a few tests have been added to t3012 to > protect the new feature from future breakage" or something like > that. > > In any case, I think everything after "point A" and before your sign > off does not belong to the log message. The diffstat shows that > documentation and tests have been added already. > > > +--dedup:: > > + Suppress duplicate entries when conflict happen > > "conflict happen" -> "there are unmerged paths", as the term > "unmerged" is already shown to readers of "ls-files --help". > > > + or `--deleted` and `--modified` are combined. > > I somehow thought that you refrained from deduping when you are > showing the stages with "ls-files -u" and "ls-files -s", or you are > showing status with "ls-files -t", because you will otherwise lose > information. In other words, showing only one cache entry out of > many that share the same name makes sense only when we are showing > name and nothing else. > > Has that been changed from the previous rounds? > > > diff --git a/builtin/ls-files.c b/builtin/ls-files.c > > index c8eae899b82..bc4eded19ab 100644 > > --- a/builtin/ls-files.c > > +++ b/builtin/ls-files.c > > @@ -316,6 +318,20 @@ static void show_files(struct repository *repo, struct dir_struct *dir) > > for (i = 0; i < repo->index->cache_nr; i++) { > > const struct cache_entry *ce = repo->index->cache[i]; > > > > + if (show_cached && delete_dup) { > > + switch (ce_stage(ce)) { > > + case 0: > > + default: > > + break; > > This part looks somewhat strange for two reasons: > > - The code enumerates ALL the possible stage numbers from 0 to 3; > if we were to have "default", I'd expect it would be a separate > switch arm from the possible values that calls out an programming > error, e.g. BUG("at stage #%d???", ce_stage(ce)). Simply removing > the "default" arm would be another way out of this strangeness. > > - When we see a stage #0 entry, we know we will not have higher > stage entries with the same name. Not clearing last_stage here > feels wrong, as the primary reason why last_stage variable is > used is to remember the last ce that was shown, so that other > entries with the same name can be skipped. > > By the way, "last_shown_ce" may be a much better name for the > variable, as you do not really care what stage number the ce you > showed last was at (you care about its name). > > Also, I do not see a good reason why the last_shown_ce variable > should have lifetime longer than the block that contains this for() > loop (and the other for loop for deleted/modified codepath we see > later). Especially since you initialize the variable that you made > visible to the entire function to NULL before entering the first for > loop, but you do not set it back to NULL before entering the second > for loop, it is inviting a subtle bug. You may have been given > show_cached and show_modified at the same time, so you enter the > first loop and have shown the first stage of the last conflicted > path, whose cache entry is left in the last_stage variable. Since > the variable has longer lifespan than necessary, when the second > loop is entered, it still points at the cache entry for the highest > stage of the last conflicted path. That is because the code forgets > to clear it to NULL before entering the second for loop. > > Having said all that, I suspect that we may be much better off if we > can somehow merge the two loops into one. You may be dedup adjacent > entries in each loop separately with the approach taken by this > patch, but I do not think the patch would work to deduplicate across > two loops. For example, what happens if you do this? > > $ git reset --hard > $ echo >>builtin/ls-files.c > $ git ls-files -c -m builtin/ls-files.c > $ git ls-files -t -c -m builtin/ls-files.c > > I think you see the path twice in the output, with or without your > --dedup option (remember what I said about proving, by the way? ;-)). > > Once we successfully merged two loops into one, the part that shows > tracked paths in the function would have only one loop, and it would > become a lot cleaner to add the logic to "skip showing the ce if it > has the same name as the previously shown one, only when doing so > won't lose information", by doing something like this: > > static void show_files(....) > { > /* show_others || show_killed done here */ > ... > > /* leave early if not showing anything */ > if (! (show_cached || show_stage || show_deleted || show_modified)) > return; > > last_shown_ce = NULL; > for (i = 0; i < repo->index->cache_nr; i++) { > const struct cache_entry *ce = repo->index->cache[i]; > > if (skipping_duplicates && last_shown_ce) > if (!strcmp(ce->name, last_shown_ce->name)) > continue; > > construct_fullname(); > > /* various reasons to skip the entry tested */ > if (showing ignored directory and ce is excluded) > continue; > if (show_unmerged && !ce_stage(ce)) > continue; > if (ce->ce_flags & CE_UPDATE) > continue; > ... other reasons may appear here ... > > /* now we are committed to show it */ > last_shown_ce = ce; > > ... various different ways to show ce come here ... > show_ce(...); > } > } > > where "skipping_duplicates" would be set when "--deduplicate" is > asked and we are not showing information other than the pathname > via various options e.g. the tags (-t) or stages (-s/-u). > > > + if (delete_dup && show_deleted && show_modified && err) > > show_ce(repo, dir, ce, fullname.buf, tag_removed); > > I actually think the original code that is still shown here ... > > > + else { > > + if (show_deleted && err) > > + show_ce(repo, dir, ce, fullname.buf, tag_removed); > > + if (show_modified && ie_modified(repo->index, ce, &st, 0)) > > ... about modified file is buggy. If lstat() failed, then &st has > no useful information, so it is wrong to feed it to ie_modified(). > > Perhaps a three-patch series that is structured like this may be in > order? > > #1: bugfix for --deleted and --modified. > > err = lstat(fullname.buf, &st); > if (err) { > /* deleted? */ > if (errno != E_NOENT) > error_errno("cannot lstat '%s'", fullname.buf); > else { > if (show_deleted) > show_ce(..., tag_removed); > if (show_modified) > show_ce(..., tag_modified); > } > } else if (show_modified && ie_modified(...)) > show_ce(..., tag_modified); > > This hopefully should not change the semantics. If you ask > --deleted and --modified, a deleted path would be listed twice. > > #2: consolidate two for loops into one. > > The two loops have slightly different condition to skip a ce, > and different logic on what tag each path is shown with. When > --cached and --modified or --deleted are asked for at the same > time, we'd show them multiple times (this is done inside the > loop for each ce) > > if (show_cached || show_stage) > show_ce(... ce_stage(ce) ? tag_unmerged : ...); > err = lstat(fullname.buf, &st); > if (err) { > /* deleted? */ > ... code that corresponds to the > ... illustration in #1 above come here. > } else if (...) > show_ce(..., tag_modified); > > This changes the semantics. The original iterates the index > twice, so you may see the same entry from --cached once and > then again from --modified. The updated one still will show > the same entry twice but next to each other. > > #3: optionally deduplicate. > > Once we have a single loop, deduplicationg based on names is > trivial, as we seen before. > > > Hmm?