Re: Strange merge failure (would be overwritten by merge / cannot merge)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was looking at this loop in traverse_trees() from tree-walk.c

	for (i = 0; i < n; i++) {
		if (!t[i].size)
			continue;
		entry_extract(t+i, entry+i);
		if (last >= 0) {
			int cmp = entry_compare(entry+i, entry+last);

			/*
			 * Is the new name bigger than the old one?
			 * Ignore it
			 */
			if (cmp > 0)
				continue;
			/*
			 * Is the new name smaller than the old one?
			 * Ignore all old ones
			 */
			if (cmp < 0)
				mask = 0;
		}
		mask |= 1ul << i;
		if (S_ISDIR(entry[i].mode))
			dirmask |= 1ul << i;
		last = i;
	}

The logic to update last breaks down while merging these three trees:

    Tree #1: tree "t"
    Tree #2: blob "t"
             blob "t-f"
    Tree #3: blob "t-f"
             tree "t"

When looking at Tree #3, "last" points at "t" (blob) from Tree #2
and we decide "t-f" comes later than that to ignore it because "cmp > 0".

We instead somehow need to look ahead and find "t", while remembering to
revisit "t-f" from it in the later rounds (of an outer loop that has this
loop inside).

Also the second comparison to update last breaks down while merging these:

    Tree #1: blob "t-f" 
             tree "t"
    Tree #2: blob "t"
    Tree #3: blob "t"
             blob "t-f"

When looking at Tree #2, "last" points at "t-f" (blob) from Tree #1 and we
decide to use "t" because it sorts earlier.  We will match "t" from Tree #3
but we somehow need to go back to Tree #1 and look beyond "t-f" to find
the matchig "t" from there as well, while remembering that we need to
revisit "t-f" from it in the later rounds..

I am thinking about changing the function this way.

 (0) take an optional "candidate for the earliest name" from the caller,
     in a new member in traverse_info structure. unpack_callback() would
     give the path of the index entry it is looking at after stripping the
     leading paths as necessary.

 (1) find the "earliest name" N among the given trees (and the optional
     additional candidate from the above).  The "earliest" is computed by
     comparing names as if everything were a blob;

 (2) for each tree t[i], if the current entry entry[i] compares
     differently with N between the cases where the N were a blob and
     where N were a tree, and if entry[i] is a blob, a tree with the name
     N might be hidden behind it.  remember the current position so we can
     come back, and scan forward to find such tree (this does not have to
     run to the end of the tree), and if we do find one, then instead of
     saying "we do not have an entry with that name", use that tree in
     entry[i].  This will collect entries with name N from all trees in
     the entry[] array;

 (3) make the callback as usual; unpack_callback(), after processing this
     round with entries with name N, may or may not advance o->pos but if
     it did so, it would also update the "candidate for the earliest name"
     field passed back in the traverse_info structure to prepare for the
     next round.

 (4) traverse_trees() will go back to step (0).

The tricky part is step (1) and the latter half of step (2) to skip,
rewind, and avoid re-processing what was processed already after
rewinding, and the progress is much slower than I would like.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]