Re: Hey - A Conceptual Simplication....

Björn Steinbrink <B.Steinbrink@xxxxxx> · Thu, 19 Nov 2009 08:42:34 +0100

On 2009.11.18 21:03:31 -0500, George Dennie wrote:
> Jason Sewell wrote....
> > I find this leads to big, shapeless commits and, as I mentioned
> > before, it seriously limits the utility of 'git bisect'.  I also
> > fail to see how 'selectively saving parts of the document' is
> > versioning and publishing - what is the publishing part?  The act of
> > committing is one thing (and 'saving...
> 
> The notion of a shapeless commit is curious. Intuitively, I consider a
> commit as capturing the state of my work at a transactional boundary
> (i.e. a successful unit test...or even lunch break). However, your
> characterization of "shape" suggest that you are constructing
> something other than the immediate functionality of the software.
> Consequently, your software document is not really the solution files
> alone but also this commit history that you meticulously craft. 

Your "lunch break" as a transaction boundary is a great example of
something that probably most people on this list would consider to
create commits that need rewriting before publishing them. Let's take an
extreme example:

You work on adding a feature to some webmail site that adds colors to
the mail being displayed, using different colors for the headers, quoted
sections and the text from the sender. The colors should be configurable
by the user.

*work*
git commit -m "Go for a coffee"
*work*
git commit -m "Lunch break"
*work*
git commit -m "Meeting"
*work*
git commit -m "Time to go home"

*come back to work*
*work*
git commit -m "Finished the mail coloring support"

This gives you:

* Finished the mail coloring support
|
* Time to go home
|
* Meeting
|
* Lunch break
|
* Go for a coffee

Such a history is basically completely useless. It's (ab)using the VCS
as a plain code dump. In a week, you'll be able to see that you had a
meeting that day, but it doesn't tell you anything about what you did to
the project. And even with less "insane" commit messages, the
"transactional boundaries" are totally arbitrary. They're aligned to
things you did that have absolutely nothing to do with the stuff you're
tracking in your VCS.

A far more useful history might look like this:

* Colorize quoted text in a mail, depending on its quoting depth
|
* Parse mails into a tree structure to represent sections of quoted text
|
* Colorize mail headers
|
* Add support for the user to change the colors used for mails
|
* Add configuration variable for the colors used for mails

At each step, something functionally changed about the software. The
commit messages tell you something about how the software evolved. And
if you get bogus values for the colors in the configuration, you can be
90% sure, by only looking at the commit messages, that you have a bug in
the "Add support for the user to change the colors ..." commit, and not
in one of the others. So you can run "git show $that_commit" to see the
diff of the changes you made in that commit and quickly check them for
your bug.

And while that's not sooo useful for commits that added new
functionality, it's extremely useful for commits that just made small
changes to existing functionality. Finding a bug in a large piece of
code (say 2000 lines) isn't trivial. But if you know that a commit that
changed 5 lines in that code is responsible for the breakage, all you
have to do is to identify the faulty change, which is a lot easier.

And with a large history, where it's not obvious in which commit
something got broken, "git bisect" can help to quickly find the bad
commit. Now consider "git bisect" finding your "Lunch break" commit.
Looking at the commit message tells nothing. The diff is pretty much
arbitrary, might be huge. Not much help. Finding the "Add support for
the user to change the colors ..." commit already tells you something
just because of the commit message. And the diff is about just one
specific change. It's all nicely separated, and that's a huge value.

Using git and producing nice commits is about _documenting_ the history
of your code. And having small, self-contained and well separated
commits is key to that.

And the index can be a great help with that. Given the above example,
you might already have some code to use the configured colors, just for
testing, so things aren't so boring. Maybe even some hack-up of the
code you'll be using later. If that part of the code would be committed
right away, you'd mess up your commit, because it wouldn't be about a
single change anymore, but would also have your testing code in there.
Bad.

But you don't want to throw the testing code away either, because it's
useful right now, and you might need it later, because it might evolve
into the final code used for the actual coloring. So, what now? You have
code that you want to commit, and some code you don't want to commit,
and which needs to go away temporarily, so you can test without it. No
problem, here comes the index.

Say you have:
config.c     # Has changes for the colors
show_mail.c  # Has changes to use the colors
whatever.c   # Has some changes for both

You do:
git add config.c       # Add to the index
git add -p whatever.c  # Only add some hunks to the index

So now the index has what you want to commit, and the working tree still
has everything.

git stash save --keep-index

Now your working tree and index only have the things you want to commit.
You run your unit tests, everythings fine. You commit and get a nice
clean commit, for which you write a useful commit message.

git stash pop

You've got your changes back that you didn't want to commit just yet,
and you can continue working.

Another use-case I have found for myself is to use the index to separate
reviewed and not-yet-reviewed changes. Before I commit, I always review
the diff of the things I'm going to commit. So I start out with "git
diff" and start reading. When I finished reviewing a file, I can do "git
add $that_file", so the diff for that file will no longer be shown by
"git diff". That nicely cuts down the size of the "git diff" output to
things I'm still interested in. Quite useful when you are forced to do a
large commit, because you did some refactoring. If I find a bug during
the review, I can fix that and re-run "git diff", which will only show
changes to me that I didn't declare as "good" already by adding them to
the index.

Sure, it takes some pratice and discipline to generate a nice, useful
history. But that's not much different from writing code. Others will
hate you for writing unreadable spaghetti code, and so will they hate
you for producing a useless history that tells them that you had lunch,
instead of telling them what you did to the code ;-)

Björn
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html