On 10/9/12 12:17 PM, John Whitney wrote:
Thank you very much for your detailed explanations. I suspected that
efficiency concerns might be preventing a clean solution.
How about this idea... When git stores files, it could include a bit
of metadata that tells it whether the file is a binary blob or text.
(Perhaps it already does this?) If a binary blob (in the repository)
is being compared with a text file (on the filesystem), git could
re-process the blob and get the "sha1 of the canonical stripped
version". In all other situations, the original SHA1 should be
correct, since git already removes CRs from the line endings in files
it recognizes as text.
I would think that this solution would have no performance penalty for
"fixed" repositories. (It would only have a small performance hit when
binary blobs are compared against text files, which is rare even in
broken repositories.) Git could even throw a warning like: "File
xyz.txt was originally stored as a binary blob."
What do you think?
---John
I'm going to reply to myself, to save you the trouble of replying.
(You've been very helpful and I do appreciate it.)
I guess the problem with this idea is that git doesn't have any way to
distinguish between binary blobs and text files in the repository. I
think it would be useful information, but I guess that bridge burned a
long time ago. So any metadata would have to be stored separately. Jeff,
that's roughly equivalent to your idea of caching, which would take a
lot of work to implement.
Thank you so much for helping me to understand the reason git behaves
the way it does. It's a great tool!
---John
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html