Re: [PATCH] Teach "git add" and friends to be paranoid

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Zygo Blaxell wrote:
> On Thu, Feb 18, 2010 at 08:27:28AM +0100, Wincent Colaiuta wrote:
>> Shouldn't a switch that hurts performance and is only needed for
>> insane use cases default to off rather than on?
>
> While I don't disagree that default off might(*) be a good idea,
> I do object to the categorization of this use case as 'insane'.

FWIW I think default off would not be a good idea.  This talk of
insane uses started from the idea that git is not so great for taking
automatic snapshots, but as you pointed out, other situations can
trigger this and the failure mode is pretty bad.

> (*) "might" be a good idea because there's been some evidence to suggest
> that a paranoid implementation of git add might perform better than the
> mmap-based one in all cases, if more work was done than anyone seems
> willing to do.

What you are saying here seems a bit handwavy.  If you have some
concrete ideas about what this paranoid implementation should look
like, I encourage you to write a rough patch.  The two patches so far
have indicated the relevant parts of sha1_file.c (index_fd at the
beginning and write_sha1_file at the end of the pipeline,
respectively).  Special cases include:

 - The blob being added to the repository is a special file (e.g.,
   pipe) specified on the 'git hash-object' command line: I think it’s
   fine if this is slow, but it should keep working.

 - The blob was generated in memory (e.g. 'git apply --cached').

 - autocrlf conversion is on.  This means scanning through the file to
   collect statistics on the dominant line ending, then scanning
   through again to convert the file.

 - some other filter is on.  This means sending the file as input to
   a command, then slurping it up somewhere until its length has been
   determined for the beginning of the blob header

 - The blob being added to the repository is already in the repository,
   so it would be a waste of time to compress and write it again.

Some of these already don’t have great performance for large files
(autocrlf and filters), and I suspect there is room for improvement
for many of them.

Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]