At Wed, 2 Dec 2009 03:10:21 +0300, Dmitry Potapov <dpotapov@xxxxxxxxx> wrote: Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!) > > My point was that I do not see why you believe "git archive" is more > expensive than "git clone". Accordingly to Jeff Epler's numbers, > "git archive" is 20% faster than "git clone"... Really!?!?!? You don't see it? Why is this so hard to understand? Sorry for my incredulity, but I thought this issue was obvious. The slightly more expensive "git clone" happens only _ONCE_. After that you just run "git pull" I think (plus maybe "git reset --hard"?), but in any case it's a heck of a lot less I/O and CPU than "git archive". And of course you skip even the one-time "git clone" operation if you use the even faster and simpler git-new-workdir script. "git archive" has to be run _EVERY_ time you need to update a working directory and it currently has no choice but to toss every bit of the whole working directory, up from the filesystem, across a pipe, and back down to the filesystem. It literally couldn't be more expensive! Sure, no matter how you do it, updating the working directory might not always be the biggest part of the operation, but it's insane to use the most expensive mechanism ever possible when there are far cheaper alternatives. BTW, there cannot, and MUST NOT, be any integrity advantage to using "git archive" over using multiple working directories. "git archive branch" must, by definition, produce exactly the same result as if you did "git checkout branch; rm -rf .git" or else it is buggy. Note also that the build directories created with git-new-workdir can be treated as read-only, and perhaps even forced to be read-only by mount options or maybe just by a corporate policy directive. (in all projects I'm working on the source tree can be read-only -- product files are always generated elsewhere) > Multiple copies of the same repo is never a problem (except taking some > disks space). Exactly -- gigabytes of disk space per copy in the cases I'm concerned about (i.e. where hard links are impossible). I've heard that at least one very large project has an 8GB repository currently. Three of the large projects I work on now are about a gigabyte per copy. That's just what's under .git too, not including the whole working directory as well. I can't even manage a "git clone" from HTTP of one of them without increasing my default process limits as it is so big and uses up too much memory. I guess one could skip the initial more-expensive "git clone" operation by copying the repo using low-level bit moving commands, like "cp -r" or whatever, and then tweak the result to make it appear as if it had been cloned, but even that requires moving gigabytes of data unnecessarily across what is likely to be a network connection of some sort. Are you fighting against git-new-workdir, or the concept of multiple working directories? > > A major further advantage of multiple working directories is that this > > eliminates one more point of failure -- i.e. you don't end up with > > multiple copies of the repo that _should_ be effectively read-only for > > everything but "push", and perhaps then only to one branch. > > I really do not understand why you say that some copies > should be effectively read-only... You can start to work on some feature > at one place (using one repo) and then continue in another place using > another repo. (Obviously, it will require to fetch changes from the > first repo, before you will be able to continue, but it is just one > command). In other words, I really do not understand what are you > talking about here. Developers, especially more junior ones, work on code, and they (are supposed to) spend almost all of their intellectual energy on the issues to do with creating and modifying code -- they are not expected to be integration engineers, nor are they expected to be VCS and SCM experts. The more steps you put in place for them to do, and the more places you allow them to store changes, etc., etc., etc., the more mistakes that they will make. Besides, in some scenarios build directories will be checked out from integration branches which shouldn't have any direct commits made to them, especially not to fix a problem in a build. BTW, pkgsrc has well over 50,000 files, FreeBSD ports is over 100,000. Neither can really be split in any rational way. -- Greg A. Woods Planix, Inc. <woods@xxxxxxxxxx> +1 416 218 0099 http://www.planix.com/
Attachment:
pgpq6FXKLvMKt.pgp
Description: PGP signature