On Tue, Aug 16, 2022 at 12:20 AM Justin Donnelly <justinrdonnelly@xxxxxxxxx> wrote: > > On Mon, Aug 15, 2022 at 12:00 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > > > Phillip Wood <phillip.wood123@xxxxxxxxx> writes: > > > > > I had not commented as I don't use the prompt. I have just had a quick > > > read and I wonder if it would be more efficient to use > > > git diff --cached --quiet --diff-filter=U > > > rather than > > > git ls-files --unmerged 2>/dev/null > > > to check if there are unmerged entries, > > > > The former reads the on-disk index into in-core index, and reads > > tree objects (recursively for subdirectories) referenced by the > > HEAD, walks both in parallel to find differences and filters out the > > result to unmerged (I am not sure how well diff-filter works with > > unmerged paths, though). > > > > The latter rads the on-disk index into in-core index, scans the > > entries and finds unmerged entries. > > > > So if we compare the overhead to run either command standalone, I am > > reasonably sure that the latter would be a lot more efficient. > > > > Here's how I tested performance. The setup and test execution code are > below. I tested each technique (`git ls-files --unmerged 2>/dev/null` > and `git diff --cached --quiet --diff-filter=U`) 3 times and listed > the results. Please let me know if you see any problems with the > methodology. > > Setup: > mkdir -p /tmp/perf/base && cd /tmp/perf/base > git clone https://github.com/torvalds/linux.git > > Each test: > cd /tmp/perf > rm -rf test > cp -r base test > cd test/linux/drivers/watchdog > git switch --detach v6.0-rc1 # pick some commit for consistency > for f in *; do echo "/* a */" >> $f; done # 182 files > git stash > for f in *; do echo "/* b */" >> $f; done > git commit -am "adding to end of files in watchdog directory" > git stash pop > time [[ $(git ls-files --unmerged 2>/dev/null) ]] > # OR run the next one instead > # time ! git diff --cached --quiet --diff-filter=U 2>/dev/null > > Results (hopefully this text lines up better for others than it does for me): > time [[ $(git ls-files --unmerged 2>/dev/null) ]] > run 1 run 2 run3 > real 0m0.008s 0m0.009s 0m0.008s > user 0m0.005s 0m0.001s 0m0.004s > sys 0m0.004s 0m0.008s 0m0.004s > > time ! git diff --cached --quiet --diff-filter=U 2>/dev/null > run 1 run 2 run3 > real 0m0.009s 0m0.009s 0m0.007s > user 0m0.004s 0m0.009s 0m0.007s > sys 0m0.004s 0m0.000s 0m0.000s > Actually, what's probably more important is how long the commands take when there is no conflict (that will be a far more common situation). I tested that today, and the numbers were about the same. > As you can see, the results are basically the same. I'm not sure if > real world usage would yield different results. So for now, I'll defer > to Junio's analysis and stick with `ls-files --unmerged`. > > > But if the shell prompt code already needs to run the diff-index for > > other reasons (e.g. to show if there is any modification added to > > the index), that may change the equation. Instead of adding a > > separate and extra call to "ls-files -u", it might be more efficient > > if you can somehow piggy-back on an existing diff-index call. For > > example, you may be running "git diff --cached --quiet" for exit code > > to show if any change has been added, but you can instead run "git > > diff --no-ext-diff --no-renames --cached --name-status" and (1) if > > there is any output, then the index is dirty, and (2) if there is a > > line that begins with "U", you have an unmerged path right there. > > It had not occurred to me to consolidate and piggy-back off of other > commands. I find the idea intriguing, but am not sure it makes sense > to do it for only a single feature (especially this feature since the > time to determine the conflict is short). I think it would make the > code more complex, and it might be better to take a holistic approach > to such an effort. Let me know if you strongly disagree. > > Thanks, > Justin