Re: Rename detection at git log

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <junkio@xxxxxxx> writes:

> There are a few things we need to be careful about rename/copy.
>...
>  - Copies are only picked up from files that were changed in the
>    same change (i.e. splitting major part of original file and
>    moving it to somewhere else, while leaving a skelton in the
>    original file).  "harder" is needed if the copy original was
>    untouched, as you found out.
>
> The last one is a compromise between performance and thoroughness,
> and the "harder" is one knob to tweak its behaviour.

If people are well disciplined, code refactoring (which can
trigger rename/copy detection) tend to affect both source and
destination files at the same time, so many times -C finds what
you want without --find-copies-harder.

But sometimes the source stays the same and you literally have
duplicate (with possibly some modifications) in the new
destination.  Finding exact copy is cheap (diffcore-rename has a
double loop that first finds exact copies without similarity
estimation which is very cheap, and then goes on to open blobs
and does its similarity magic for destinations whose origin is
still unknown) but copy/rename with edit is not, and "harder"
variant feeds _everything_ from the older tree as a candidate of
copy source, so it is very expensive for huge projects.

> In the kernel archive, 
>
> 	git show -C ad2f931d
>
> tells us that:
>
>  - drivers/i2c/chips/Kconfig lost major part of it and only
>    skeletal part of the original remains in it;
>
>  - major part of it went to drivers/hwmon/Kconfig;
>
> The story is similar to the Makefile next door.

Having said all that, I think the rename/copy as a wholesale
operation on one file is an uninteresting special case.  The
generic case that happens far more often in practice is the
lines moving around across files, and the new "git blame" gives
you better picture to answer "where the heck did this come from"
question.

For example,

	git blame -f -n -C 'ad2f931d^!' -- drivers/hwmon/Kconfig

on the same commit would show that many of its lines came from
i2c/chips/Kconfig but not all of them.

There are quite a few other things I should probably mention for
new people on the list about rename/copy/break heuristics but it
is getting late so I'd defer it to some other time.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]