On Tue, Dec 01, 2009 at 05:44:15PM -0500, Greg A. Woods wrote: > At Wed, 2 Dec 2009 00:18:30 +0300, Dmitry Potapov <dpotapov@xxxxxxxxx> wrote: > Subject: Re: multiple working directories for long-running builds (was: "git merge" merges too much!) > > > > AFAIK, "git archive" is cheaper than git clone. > > It depends on what you mean by "cheaper" You said: >>> "git archive" is likely even more >>> impossible for some large projects to use than "git clone" My point was that I do not see why you believe "git archive" is more expensive than "git clone". Accordingly to Jeff Epler's numbers, "git archive" is 20% faster than "git clone"... > It's clearly going to require > less disk space. However it's also clearly going to require more disk > bandwidth, potentially a _LOT_ more disk bandwidth. Well, it does, but as I said earlier it is not the case where I expect things being instantaneous. If you do not a full build and test, then it is going to take a lot of time anyway, and git-archive is not likely to be a big overhead in relative terms. > > I do not say it is fast > > for huge project, but if you want to run a process such as clean build > > and test that takes a long time anyway, it does not add much to the > > total time. > > I think you need to try throwing around an archive of, say, 50,000 small > files a few times simultaneously on your system to appreciate the issue. > > (i.e. consider the load on a storage subsystem, say a SAN or NAS, where > with your suggestion there might be a dozen or more developers running > "git archive" frequently enough that even three or four might be doing > it at the same time, and this on top of all the i/o bandwidth required > for the builds all of the other developers are also running at the same > time.) First of all, I do not see why it should be done frequently. The full build and test may be run once or twice a day, and the full test and build may take an hour. git-archive will take probably less than minute for 50,000 small files (especially if you use tmpfs). In other words, it is 1% overhead, but you get a clean build, which is fully tested. You can sure that no garbage left in the worktree that could influence the result. > > > > Disk bandwidth is almost always more expensive than disk space. > > > > Disk bandwidth is certainly more expensive than disk space, and the > > whole point was to avoid a lot of disk bandwidth by using hot cache. > > Huh? Throwing around the archive has nothing to do with the build > system in this case. "git archive" to do full build and test, which is rarely done. Normally, you just switch between branches, which means a few files are changed and rebuild, and no archive is involved here. > > I'm just not willing to even consider using what would really be the > most simplistic and most expensive form of updating a working directory > as could ever be imagined. "Git archive" is truly unintelligent, as-is. "git archive" is NOT for updating your working tree. You use "git checkout" to switch between branches. "git checkout" is intelligent enough to overwrite only those files that actually differ between two versions. > A major further advantage of multiple working directories is that this > eliminates one more point of failure -- i.e. you don't end up with > multiple copies of the repo that _should_ be effectively read-only for > everything but "push", and perhaps then only to one branch. Multiple copies of the same repo is never a problem (except taking some disks space). I really do not understand why you say that some copies should be effectively read-only... You can start to work on some feature at one place (using one repo) and then continue in another place using another repo. (Obviously, it will require to fetch changes from the first repo, before you will be able to continue, but it is just one command). In other words, I really do not understand what are you talking about here. > > I know of at least three very real-world projects where there are tens > of thousands of small files that really must be managed as one unit, and > where running a build in that tree could take a whole day or two on even > the fastest currently available dedicated build server. Eg. pkgsrc. Tens of thousands files should not be a problem... For instance, the Linux kernel has around 30 thousands and Git works very well in this case. But I would consider to split if it has hundrends of thousands... Dmitry -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html