Re: [RFC/PATCH] patch-ids: check modified paths before calculating diff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> writes:

> I think such a loose patch-id could just be a hash of the filenames that
> were changed by the patch (e.g., the first 32-bits of the sha1 of the
> concatenated filenames). Computing that should be about as expensive as
> a tree-diff. Per observation 2 above, if two commits do not have the
> same loose id, we know that they cannot possibly have the same strict
> id.

Because the "strict" one already hashes the filenames, if files that
are touched by a patch is different from that of another patch, we
judge them being different.

> Then we can forget about the smaller-side and bigger-side entirely, and
> just do something like:
>
>   1. Make a sorted list (or hash table) of loose ids for one side.
>
>   2. For each commit on the other side, calculate its loose id and look
>      that up in the sorted list. If no hits, we know that there is no
>      match. For any hits, lazily calculate (and cache) the strict patch
>      id for both sides and compare as usual.
>
> In the best case, we compute no patch-ids at all. And even for the
> average case, I'd expect our lazy calculation to only have to compute a
> handful of ids.

Correct.

This has rather interesting ramifications on cherry-pick and rebase,
though.  Both command can handle changes that come from an old tree
before some paths were renamed, but strict patch-id would not spot
equivalent changes we already have in our history if our change
happened after a rename, i.e.

   Z
  /
 O---R---X---Y

where Z updates path F, R moves F to G and X changes G the same way
as Z changes F, and we are trying to cherry-pick Z on top of Y.  The
cherry-pick filter will see different patch-id for Z and X.

We will likely to notice that "patch already applied" (if using am-3
machinery) or "already up-to-date" (if using merge machinery) even
when we missed this equivalency and drop the duplicate from the
result, so it is not a big loss, but we might want to consider
removing the filename from patch-id computation, at least for the
ones we internally use and discard for revs->cherry_pick filtering.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]