Re: [PATCH] Teach "git add" and friends to be paranoid

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 18, 2010 at 08:27:28AM +0100, Wincent Colaiuta wrote:
> Shouldn't a switch that hurts performance and is only needed for insane use cases default to off rather than on?

While I don't disagree that default off might(*) be a good idea,
I do object to the categorization of this use case as 'insane'.

Neither the documentation for 'git add' nor its various aliases (e.g. 'git
commit' with paths or -a, etc) mentions that any use of 'git add'
might cause repository corruption under any circumstances.  Contrast with
examples of repository-corrupting pitfalls in the man pages of tools
such as 'git clone' and 'git gc'.

In fact, the language in the git add man page seems to suggest the
opposite, using words like "snapshot" and pointing out several times
that the index is intentionally immune to changes interleaved between
'git add' and 'git commit' commands.

Common sense (for Unix users) is that the index is not immune to changes
*during* git add, but nowhere in my wildest nightmares would I conceive
that changes in file contents during git add would *corrupt the
repository* and git would *fail to notice or give useful diagnostics*
until *days or weeks later* after the corruption has already *spread to
multiple repositories* through *git push with default options*.

Now, if you want to put that text in the man pages of 'git add' and
friends, and point out the paranoia switch there, I have nothing to
object to.

I also see nothing prohibiting concurrent file modification in some
reasonable revision-control use cases.  What happens if I do a 'git
commit -a' on, say, proprietary EDA tool data files or Microsoft Office
documents, and those tools choose an unfortunate moment to automatically
update files under revision control?  Granted, I can't really expect the
repo to contain usable data, but what I do expect is another commit, or
a rebased/amended commit, that fixes the mangled file's contents--not to
be required to rebase on the commit's parent everything that comes
after it, then purge my reflogs so 'git gc' will work again.

Working directories on network filesystems might do all kinds of strange
things, most of which aren't intentional.  It's one thing to commit a
useless tree, and quite another to unintentionally commit an irretrievable
one.

(*) "might" be a good idea because there's been some evidence to suggest
that a paranoid implementation of git add might perform better than the
mmap-based one in all cases, if more work was done than anyone seems
willing to do.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]