Re: [PATCH 1/5] Add 'df_name_compare()' helper function

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Thu, 6 Mar 2008 07:58:48 -0800 (PST)

On Thu, 6 Mar 2008, David Kastrup wrote:
> 
> Uh, that screams just bloody murder at me with regard to working on
> sorted material.  You'll never even get into the situation where the
> conflicting "equal" entries will be compared if you presorted them with
> something in between.

And you really can't.

> Consider the case of a merge of
> 
> A:
> blubb
> blubb.hi
> 
> B:
> blubb.hi
> blubb/
> 
> Any traversal that is done reasonably efficiently will never compare
> blubb to blubb/ and I don't see how to get around this.

Correct. There _is_ no sort order that will find the conflict in a single 
pass, yet get the sorting right for this case, because there is no sort 
that can satisfy both the "get DF conflicts together" _and_ the "keep 
things in the right order" requirements.

Btw, the old code was no better (and was probably worse) - it used 
"strcmp()" to order the things, so it effectively did the same thing, it 
just wasn't as obvious about it (and got the case of "foo/" vs "foo." 
entirely wrong). The new code makes this hack very explicit.

> Basically, I think you need a special traversal routine.

Yes, we need to handle it in two passes. Which is actually hopefully not 
all that hard, but it is totally impossible (at least for me) with the old 
code that was so hard to follow.

So what I did was to rewrite the code so that it can be followed and 
maintained, and now my plan is to simply put the result in a separate 
index. 

We've always really wanted to do that *anyway* (right now we have this 
"discard and re-read index" cruft in the callers and in the error cases to 
get back the index we overwrote when merging!), and I bet we could have 
done that with the old code too, but _I_ couldn't have done it.

The problem with having a single index as both source and destination is 
that you cannot do a two-phase thing, because you are basically destroying 
one of your sources in your first phase (or put another way: in the second 
phase, you don't really know whether the index entry you now find was 
there to begin with, of whether you just generated it yourself earlier).

So what that series of five patches do is to not fix a single bug, but to 
make the thing a bit easier for me to look at (and I think it's good for 
other reasons too - one less arbitrary tree traversal function!)

I'm hoping that people can look at the patches and test them, and say 
"yeah, that looks like it does the same thing we used to", and then when I 
get the energy (hopefully later today) I'll just make it take a separate 
source and destination index.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html