Re: Comments on Presentation Notes Request.

"Tim Visher" <tim.visher@xxxxxxxxx> · Wed, 7 Jan 2009 11:11:43 -0500

Thanks for the suggestions so far.  I've updated the notes.

@Peff: Thanks especially for pointing me towards Junio's
presentatation.  That's an excellent source.

Here's the patch for your suggestions:

diff --git a/scmOutline.txt b/scmOutline.txt
index 1791fa0..d25198c 100644
--- a/scmOutline.txt
+++ b/scmOutline.txt
@@ -1,4 +1,4 @@
-SCM: Distributed, Centralized, and Everything in Between.

+SCM: Centralized, Distributed, and Everything in Between.



 * What is SCM and Why is it Useful?



@@ -20,7 +20,11 @@ Not only is it unlimited, but it's random access.
If you changed a function a w


 Many people can edit the same code base at the same time and know,
without a doubt, that when they pull all those changes together, the
system will merge the content intelligently or inform you of the
conflict and let you merge it.  You don't need to lock files.
Obviously, if there is bad coordination then the possibilities of
conflicts rise, but this should not happen regularly.



-*** Diff Debugging

+*** Software Archeology

+

+With a proper SCMS, it becomes a somewhat trivial operation to
discover the author and reasons for a given change.  This is because
of the rich metadata associated with commits (author, date, complete
change set, diffs, and commentary).  So rather than wandering asking
if anyone remembers doing something and why, you simply commit that
information into the system and then refer to it when you need to.

+

+**** Diff Debugging



 You can find where a bug was introduced by learning how to reproduce
the bug and then doing a binary chop search back through the History
to come to the exact commit that introduced the bug.



@@ -30,11 +34,11 @@ You can find where a bug was introduced by
learning how to reproduce the bug and


 The more you commit, the more fine grained control you have over the
undo feature of SCM.  Most documents that I have read suggested a TDD
approach wherein you commit whenever you have written just enough code
for your test to pass. But...



-** Don't Commit Broken Code (To the Public Tree)

+** Don't _Publish_ Broken Code



 Of primary concern is the fact that your central HEAD should _always_
build.  This is why practices like Continuous Integration and TDD are
so important.  TDD gives you the freedom to be sure that a change you
made hasn't broken anything you weren't expecting it to break.
Continuous Integration allows you to be sure that your whole system
will build every time.  Thus, you should _never_ commit broken code to
the (public) tree.



-Of course, in a centralized system, committing is intrinsically
public.  Even on branches, every time you commit any sort of change,
everyone is able to see it and so you could be breaking the build for
someone (even if it's just yourself and the build system).  One of the
nice features of a distributed system is that your public/private
ontology is much richer and thus allows you to have broken code in
your SCMS.

+Of course, in a centralized system, committing is intrinsically
public.  Even on branches, every time you commit any sort of change,
everyone is able to see it and so you could be breaking the build for
someone (even if it's just yourself and the build system).  One of the
nice features of a distributed system is that your public/private
ontology is much richer and thus allows you to have broken code in
your SCMS, so long as you haven't published it, at no penalty to
anyone but yourself.



 ** Whole Hog



@@ -130,7 +134,9 @@ Once you've published, however, not much changes.
Almost everything except upda


 *** Natural Backup



-Because every developer has a copy of the repository, every developer
you add adds an extra failure point.  The more developers you have,
the more backups you have of the repository.

+Because every developer has a copy of the repository, every developer
you add adds an extra layer of redundancy.  The more developers you
have, the more backups you have of the repository.

+

+An important point to make clear here is that you only are backing up
what everyone is duplicating.  If you have 10 unpublished branches
that no one else has cloned, then those are obviously not backed up.
However, the idea here would be that anything that is being developed
actively by multiple people is backed up by as many developers.  Other
than that, your private data must be backed up by you (which is what
you do anyway, right? ;).



 *** Must Learn New Work Flows.



@@ -148,6 +154,8 @@ This bears some explanation.  Within a distributed
system, you can have a single


 Git's implementation just happens to be wickedly fast.  It's faster
than mercurial, it's faster than bazaar, etc.  Everything, committing,
merging, viewing history, branching, and even updating and and pushing
are all faster.



+This is much more important than just shaving a few seconds off the
operations.  Because Git is so much faster, you begin to do things
differently because of how fast it is.  Git's blazing fast branching
and merging wouldn't matter at all if you never branched and merged
(which is possible), but because their blazing fast you _should_ begin
to branch and merge much more often, which __does__ fundamentally
change the way you develop your code (hopefully for the better).

+

 ** Tracks Content, not Files



 Git tracks content, not files, and it's the only SCMS at the moment
that does this.  This has many effects internally, but the most
apparent effect I know of is that for the first time Git can easily
tell you the history of even a function in a file because Git can tell
you which files that function existed (or does exist) in over the
course of development.

@@ -171,9 +179,9 @@ This is very powerful yet somewhat awkward to
grasp.  Basically, the upshot of t


 I've found this to be particularly useful when working with an
existing code base that was not properly formatted.  Often, I'll come
to a file that has a bunch of wonky white space choices and improperly
indented logical constructs and I'll just quickly run through it
correcting that stuff before continuing with the feature I was working
on.  Afterwords, I'll stage the formatting and commit it, and then
stage the feature I was working on and commit that.  You may not want
that kind of control (and if you don't, you don't need to use it), but
I like it.



-** Excellent Merge algorithms

+** Stupid but _Fast_ Merge Algorithms



-Git has excellent merge algorithms.  This is widely attributed and
doesn't require much explanation.  It was one of Git's original design
goals, and it has been proven by Git's implementation.  Merging in Git
is _much_ less painful than in other systems.

+Merging in Git is _much_ less painful than in other systems.  This is
mainly because of how fast it is and how much data it remembers when
it does a merge.  As opposed to CVS which can't merge a branch twice
because it doesn't remember where the last merge happened, Git keeps
track of that information so you can merge between branches as much as
you want.  Git's philosophy is to make merging as fast and painless as
possible so that you merge early and often enough to not develop
really bad conflicts that are nearly impossible to resolve.



 ** Has powerful 'maintainer tools'



@@ -196,3 +204,4 @@ Git guarantees absolutely that if corruption
happens, you will know about it.  I
 - <http://svnbook.red-bean.com/> - Rolling publish book on
Subversion.  Chapter 1 is a good introduction to general centralized
SCM concepts and principles.

 - <http://www.perforce.com/perforce/bestpractices.html> - An
excellent set of best practices from the Perforce team.  Some of it
(especially the branches) has a distinct centralized lean, but most of
it is quite good.

 - <http://www.bobev.com/PresentationsAndPapers/Common%20SCM%20Patterns.pdf>
- Interesting presentation by Pretzel Logic from 2001 attempting to
outline some common SCM best practices as Patterns.

+- <http://members.cox.net/junkio/200607-ols.pdf> - A presentation by
Junio Hamano (the Git maintainer) at a Linux symposium on what Git is
with some tutorials.


I've also attached it as a file.  It was generated by `git diff -p`.

I'm also looking for anyplace where I'm technically inaccurate.
Unfortunately, I've written a lot of this from things that I've either
read or heard.  I'm mainly experienced with VSS and Subversion (and
both of those to a very small degree), and making a lot of progress
with Git.  I've kind of been swept away by all the energy surrounding
git right now, though, so I'm sure my judgement is somewhat clouded.

Thanks again for your help!

-- 

In Christ,

Timmy V.

http://burningones.com/
http://five.sentenc.es/ - Spend less time on e-mail
Attachment:
suggestionsPatch01

Description: Binary data