Esko Luontola wrote:
Robin Rosenberg wrote on 13.5.2009 8:24:
If the conclusion is that this is a way forward, then I
could start working on a completely new set of much cleaner patches.,
That would be great!
I see that in those early patches you took the approach of converting
the filenames from the local encoding to UTF-8 at the outer edges of
Git. That obviously was the easiest way to make the changes with minimal
changes to Git.
I've been thinking about a bit more extensive approach, which should
serve the interest of all stakeholders:
Now the tree object contains the following information for each file:
filename, mode, sha1. To that would be added one more string: filename
encoding. Unless the encoding is specified (such as in old commits
before the encoding information was added), the default encoding is
"binary", which is the same as how Git works now (it thinks filenames as
series of bytes, ignoring their encoding completely).
[ long and incompatible plan removed ]
One big question is, that will this change require a change to the
repository format? Will it be possible to add the encoding field to the
tree object, without breaking compatibility with older Git clients? If
compatibility needs to be broken, how it can be done in a controlled
fashion?
Generally when one wants to change one of the basic object types in
git, some extraordinary benefit has to be shown that is not aimed
at just a few people. Academic benefits (ie, "non-real-worldy") do
not fall into that category. In fact, it's so rare for someone to
provide such enormous benefit that the only time a core object format
in git has been incompatibly changed is when Linus decided that trees
should be able to have subtrees. The change reduced the repository
size for the early git-tracked Linux kernel to about 4% of its
original size, so there was a clear, undisputable and obvious benefit
huge enough to warrant breaking the git repository format entirely
just to get it in (I might have gotten those details entirely wrong,
but it was something along those lines).
So unless you can change tree objects in a way that lets older git
clients understand them while still adding this encoding cruft
(it's cruft to me), I think your chances of getting such a change
into the git core are about the size of the colour green.
If you're *really* serious about it though, here's how to go about
it:
1. Make the changes so that newer git can always read and operate
on trees without the encoding information, regardless of what the
configuration says.
2. Modify 1.4.x branch to support this new format too, at least
for reading trees with the information in it. Otherwise some
package maintainers will just ignore such compatibility.
3. Modify 1.5.x branch similarly.
5. Make it configurable, but turned off by default and with a big
fat warning when its turned on.
6. 2 years later, remove the warning.
7. 2 years lter, turn it on by default.
8. 2 years later, remove the config option and make it a new
major release, but maintain the two codepaths forever.
1.[45].x branches are imaginary. They represent the branch that
gets created when a new release in that series is necessary for
some reason.
I haven't perused Robin's patches enough to know how they would
interact with older git, and I'm not really interested in encoding
issues. English being the lingua franca of internet and opensource
development anyways, every project I've ever seen has only files
named in a manner that would fit nicely into 7-bit ascii.
--
Andreas Ericsson andreas.ericsson@xxxxxx
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Register now for Nordic Meet on Nagios, June 3-4 in Stockholm
http://nordicmeetonnagios.op5.org/
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html