Am Montag 27 Oktober 2008 02:52:22 schrieb Jakub Narebski: > On Mon, 27 Oct 2008, Arne Babenhauserheide wrote: > > Am Sonntag 26 Oktober 2008 19:55:09 schrieb Jakub Narebski: > > > I agree, and I think it is at least partially because of Git having > > > cleaner design, even if you have to understand more terms at first. > > > > What do you mean by "cleaner design"? > > Clean _underlying_ design. Git has very nice underlying model of graph > (DAG) of commits (revisions), and branches and tags as pointers to this > graph. > > > From what I see (and in my definition of "design"), Mercurial is designed > > as VCS with very clear and clean design, which even keeps things like > > streaming disk access in mind. > > I have read description of Mercurial's repository format, and it is not > very clear in my opinion. File changesets, bound using manifest, bound > using changerev / changelog. This grows very simple if you keep common filesystem layout in mind. inodes and datanodes (the files in the store), organized in directories which keep many files (manifests) bound in changesets which keep additional data. > Mercurial relies on transactions and O_TRUNC support, while Git relies > on atomic write and on updating data then updating reference to data. For most operations Mercurial just relies on appending support. > I don't quite understand comment about streaming disk access... If you tell a disk "give me files a, b, c, d, e, f (of the whole abc)", it is faster then if you tell it "give me files a k p q s t", because the filesystem can easier optimize that call. That's why for example Mercurial avoids hashing filenames. > Well, they have to a lot less than they used to, and there is > "git gc --auto" that can be put in crontab safely. relying on crontab which might not be available in all systems (I only use GNU/Linux, but what about friends of mine who have to use Windows?) > Explicit garbage collection was a design _decision_, not a sign of not > clear design. We can argue if it was good or bad decision, but one > should consider the following issues: > > * Rolling back last commit to correct it, or equivalently amending > last commit (for example because we forgot some last minute change, > or forgot to signoff a commit), or backing out of changes to the > last commit in Mercurial relies on transactions (and locking) and > correct O_TRUNC, while in Git it leaves dangling objects to be > garbage collected later. As far as I know the only problem woth O_TRUNC was that it sadly had bugs in Linux. > * Mercurial relies on transaction support. Git relies on atomic write > support and on the fact that objects are immutable; those that are > not needed are garbage collected later. Beside IIRC some of ways of > implementing transaction in databases leads to garbage collecting. But Mercurial normally works on standard filesystems, so this isn't the case for normal operations. You culd say, though, that git implements a very simple transaction model: Keep all old data until it gets purged explicitely. > * Explicit packing and having two repository "formats": loose and > packed is a bit of historical reason: at the beginning there was > only loose format. Pack format was IIRC invented for network > transport, and was used for on disk storage (the same format!) for > better I/O patterns[1]. Having packs as 'rewrite to pack' instead > of 'append to pack' allows to prefer recency order, which result in > faster access as objects from newer commits are earlier in delta > chain and reduction in size in usual case of size growing with time > as recency order allows to use delete deltas. Also _choosing_ base > object allows further reduce size, especially in presence of > nonlinear history. So having multiple packs is equivalent to the automatic snapshot system in Mercurial which doesn't need user interaction. > * From what I understand Mercurial by default uses packed format for > branches and tags; Git uses "loose" format for recent branches > (meaning one file per branch), while packing older references. > Using loose affects performance (and size) only for insane number of > references, and only for some operations like listing all references, > while using packed format is IMHO a bit error prone when updating. As far as I know, Mercurial got that "using packed format" right from the beginning. > * Git has reflogs which are pruned (expired) during garbage collecting > to not grow them without bounds; AFAIK Mercurial doesn't have > equivalent of this feature. > > (Reflogs store _local_ history of branch tip, noting commits, > fetches, merges, rewinding branch, switching branches, etc._ As far as I know Mercurial only tracks the state of the working directory, so it doesn't track your whole local history. But others can better tell you more about that in greater detail. > [1] You wrote about "streaming disk access". Git relies (for reading) > on good mmap implementation. > > > In git is has to check all changesets which affect the file. > > I don't understand you here... if I understand correctly above, > then you are wrong about Git. Might be that I remember incorrectly about what git does. Are its commits "the whole changed file" or "the diff of the changes"? If the latter, it needs to walk back all commits to the snapshot revision to get the file data. One story I experienced with that: My amd64 GNU/Linux box suffers from performance problems when it gets high levels of disk activity (something about the filesystem layer doesn't play well with amd64 - reported by others, too). When I pulled a the Linux kernel repository with git half a year ago, my disk started klicking and the whole computer slowed down to a crawl. When I pulled the same repository data from a Mercurial repository, the computer kept running smooth, the disk stayed silent and happily wrote the data. Mercurial felt smooth, while git felt damn clumsy (though not slow). > > 1) Hg is easy to understand > > Because it is simple... and less feature rich, c.f. multiple local > branches in single repository. That works quite well. People just don't use it very often, because the workflow of having multiple repositories is easier with hg. > > 2) You don't have to understand it to use it > > You don't have to understand details of Git design (pack format, index, > stages, refs,...) to use it either. I remember that to have been incorrect about half a year ago, when I stumbled over many problems in git whenever I tried to do something a bit nonstandard. It took me hours (and in the end asking a friend) to find out about "git checkout ." just to get back my deleted files. The answer I got when I asked why it's done that way was "this is because of the inner workings of git. You should know them if you use it". > > And both are indications of a good design, the first of the core, the > > second of the UI. > > Well, Git is built around concept of DAG of commits and branches as > references to it. Without it you can use Git, but it is hard. But > if you understand it, you can understand easily most advanced Git > features. > > I agree that Mercurial UI is better; as usually in "Worse is Better" > case... :-) What do you mean with that? Best wishes, Arne -- My stuff: http://draketo.de - stories, songs, poems, programs and stuff :) -- Infinite Hands: http://infinite-hands.draketo.de - singing a part of the history of free software. -- Ein Würfel System: http://1w6.org - einfach saubere (Rollenspiel-) Regeln. -- PGP/GnuPG: http://draketo.de/inhalt/ich/pubkey.txt
Attachment:
signature.asc
Description: This is a digitally signed message part.