Re: [PATCH 4/6] copy vs rename detection: avoid unnecessary O(n*m) loops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 26 Oct 2007, Junio C Hamano wrote:

> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
> 
> > @@ -2640,27 +2640,21 @@ static void diff_resolve_rename_copy(void)
> >  		 * either in-place edit or rename/copy edit.
> >  		 */
> >  		else if (DIFF_PAIR_RENAME(p)) {
> > +			/*
> > +			 * A rename might have re-connected a broken
> > +			 * pair up, causing the pathnames to be the
> > +			 * same again. If so, that's not a rename at
> > +			 * all, just a modification..
> > +			 *
> > +			 * Otherwise, see if this source was used for
> > +			 * multiple renames, in which case we decrement
> > +			 * the count, and call it a copy.
> >  			 */
> > +			if (!strcmp(p->one->path, p->two->path))
> > +				p->status = DIFF_STATUS_MODIFIED;
> > +			else if (--p->one->rename_used > 0)
> >  				p->status = DIFF_STATUS_COPIED;
> > +			else
> >  				p->status = DIFF_STATUS_RENAMED;
> >  		}
> >  		else if (hashcmp(p->one->sha1, p->two->sha1) ||
> 
> The interaction between the above and ...

Heh.

I'm pretty sure it's right, because:

> > @@ -338,15 +320,25 @@ void diffcore_rename(struct diff_options *options)
> >  				locate_rename_dst(p->two, 1);
> >  		}
> >  		else if (!DIFF_FILE_VALID(p->two)) {
> > +			/*
> > +			 * If the source is a broken "delete", and
> >  			 * they did not really want to get broken,
> >  			 * that means the source actually stays.
> > +			 * So we increment the "rename_used" score
> > +			 * by one, to indicate ourselves as a user
> > +			 */
> > +			if (p->broken_pair && !p->score)
> > +				p->one->rename_used++;
> > +			register_rename_src(p->one, p->score);
> > +		}
> > +		else if (detect_rename == DIFF_DETECT_COPY) {
> > +			/*
> > +			 * Increment the "rename_used" score by
> > +			 * one, to indicate ourselves as a user.
> >  			 */
> > +			p->one->rename_used++;
> > +			register_rename_src(p->one, p->score);
> >  		}
> >  	}
> >  	if (rename_dst_nr == 0 || rename_src_nr == 0)
> >  		goto cleanup; /* nothing to do */
> 
> ... this part feels a bit too subtle for a still-jet-lagged
> brain to grok.  I wonder what happens if the preimage of a
> broken pair is used as the rename source for more than two
> postimages.

The nice thing about the whole counting thing is that we really don't even 
care. What happens is:

 - *if* any rename at all happens, the rename detection will increment the 
   "rename_used" thing even more for the source (once for each rename)

 - so if the rename_used started out non-zero, it will never become zero 
   in diff_resolve_rename_copy(), and every single detected "rename" will 
   be considered a copy - exactly like we want.

 - in other words, a "rename_used++" before rename-detection guarantees 
   that you never see a rename, only a copy of the source.

The above is actually true *even*if* the 

	if (!strcmp(p->one->path, p->two->path))

code in diff_resolve_rename_copy() actually triggers, so we could have 
(and at one point I did) done the "--p->one->rename_used > 0" test before 
the strcmp() test, and it would all continue to work fine. However, the 
reason that I put the strcmp() first is that it needs to be done whether 
we decide to consider it a "copy" or a "rename" (so we cannot avoid it 
anyway), and *if* it triggers, we actually want to avoid the rename_used 
going down to zero anyway (not that it would, because I think it's bound 
to be one of the cases where we pre-incremented the count), so not doing 
the decrement there is equivalent to doing another pre-increment.

So I think it's all right, and more obviously right than it used to be. 
But hey, it's possible that I missed something.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux