Re: [PATCH 1/1] diffcore-rename: speed up register_rename_src

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 6/8/2019 6:27 PM, René Scharfe wrote:
Am 08.06.19 um 16:42 schrieb Jeff Hostetler via GitGitGadget:
From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>

Teach register_rename_src() to see if new file pair
can simply be appended to the rename_src[] array before
performing the binary search to find the proper insertion
point.

This is a performance optimization.  This routine is called
during run_diff_files in status and the caller is iterating
over the sorted index, so we should expect to be able to
append in the normal case.  The existing insert logic is
preserved so we don't have to assume that, but simply take
advantage of it if possible.

Signed-off-by: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx>
---
  diffcore-rename.c | 13 +++++++++++++
  1 file changed, 13 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 07bd34b631..5bfc5f6c22 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -82,6 +82,18 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)

  	first = 0;
  	last = rename_src_nr;
+
+	if (last > 0) {
+		struct diff_rename_src *src = &(rename_src[last-1]);
+		int cmp = strcmp(one->path, src->p->one->path);
+		if (!cmp)
+			return src;
+		if (cmp > 0) {
+			first = last;
+			goto append_it;

The goto is not necessary from a logical point of view; the binary
search is skipped even without it because the loop condition below is
not met anymore.

Perhaps this is just a personal style thing, but I prefer the
goto because it makes it clear that we know the answer and are
done searching.  I know it's a small thing and I know we do it
all the time, but setting state here in just the right way as
to defeat the loop would work, but is a little less clear.  But
again, that is a personal style thing I guess.


+		}

You could decrement "last" at this point as a micro-optimization, to use
all three possible outcomes of the comparison to make some progress.

Not sure if any of that would have a _measurable_ impact, though, so I
don't mind the patch going in as is.

Yes, it looks like we could decrement "last" here.  And yes, I
suspect it would have minimal impact.  I'll pass on that in this
series to keep it focused on the advertised goal of simply
appending if possible.


+	}
+
  	while (last > first) {
  		int next = (last + first) >> 1;

Hmm, "last" and "first" are ints as well, so this will give weird results
when "last" > INT_MAX / 2.  I thought 19716b21a4 ("cleanup: fix possible
overflow errors in binary search", 2017-10-08) got us rid of those, but
git grep -n '(.*+.*) >> 1' actually finds some more cases:

    builtin/ls-files.c:376:         int next = (last + first) >> 1;
    diffcore-rename.c:26:           int next = (last + first) >> 1;
    diffcore-rename.c:86:           int next = (last + first) >> 1;
    dir.c:704:              int cmp, next = (last + first) >> 1;
    name-hash.c:349:                        int mid = (begin + end) >> 1;
    read-cache.c:552:               int next = (last + first) >> 1;
    sh-i18n--envsubst.c:252:          size_t j = (j1 + j2) >> 1;

(Not caused by this patch, of course, just noticed it in the context.)

  		struct diff_rename_src *src = &(rename_src[next]);
@@ -95,6 +107,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
  		first = next+1;
  	}

+append_it:
  	/* insert to make it at "first" */
  	ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
  	rename_src_nr++;


Anyway, this here should work as well (and fix the possible overflow),
but may be too terse and quirky:

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 07bd34b631..f2a9007e08 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -76,14 +76,13 @@ static int rename_src_nr, rename_src_alloc;

  static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
  {
-	int first, last;
+	int first, last, next;
  	struct diff_filespec *one = p->one;
  	unsigned short score = p->score;

  	first = 0;
  	last = rename_src_nr;
-	while (last > first) {
-		int next = (last + first) >> 1;
+	for (next = last - 1; last > first; next = first + (last - first) / 2) {
  		struct diff_rename_src *src = &(rename_src[next]);
  		int cmp = strcmp(one->path, src->p->one->path);
  		if (!cmp)


I'd like to limit the scope of this patch series to just
the advertised topic and save the arithmetic fixes to
another patch series like 19716b21a4.

Jeff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux