Re: [PATCH] Use path comparison for patch ids before the file content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kevin Willford <kcwillford@xxxxxxxxx> writes:

> When limiting the list in a revision walk using cherry pick, patch ids are
> calculated by producing the diff of the content of the files.  This would
> be more efficent by using a patch id looking at the paths that were
> changed in the commits and only if all the file changed are the same fall
> back to getting the content of the files in the commits to determine if
> the commits are the same.

The basic idea of this change makes sense.  When we have many
commits, but if we can tell that no other commit changes the same
set of paths as this commit does, we can immediately know that this
commit cannot have an equivalent other commit among the rest.  By
first computing a lot cheaper "hash of touched paths" for commits,
and throwing them into separate bins keyed by the "hash of touched
paths", you can narrow the commits whose patch IDs must be compared,
and if a bin happens to be a singleton, you do not even need to
produce any patch ID by running a textual diff.  I like it.

Explaining this as "hash of touched paths" is somewhat misleading.
Your "use_path_only" mode actually hashes a lot more than just
paths.  Because the "use_path_only" mode actually hashes the entire
basic diff header and not just paths, it can differentiate a commit
that adds a file and another commit that modifies the same file, for
example.

> ...  This will speed up a rebase where the
> upstream has many changes but none of them have been pulled into the
> current branch.
> ---

Missing sign-off.

>  diff.c      |  16 +++++----
>  diff.h      |   2 +-

The changes in the above two files looked OK to me.
I didn't read the changes to the other three files carefully.

>  patch-ids.c | 114 +++++++++++++++++++++++++++++-------------------------------
>  patch-ids.h |   7 ++--
>  revision.c  |  19 ++--------
>  5 files changed, 73 insertions(+), 85 deletions(-)

>
> diff --git a/patch-ids.c b/patch-ids.c
> index a4d0016..f0262ce 100644
> --- a/patch-ids.c
> +++ b/patch-ids.c
> @@ -4,8 +4,9 @@
> ...
> +}
> \ No newline at end of file

No newline at end of file.


Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]