Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

Karsten Blees <karsten.blees@xxxxxxxxx> · Thu, 24 Oct 2013 02:57:15 +0200

Am 19.10.2013 16:08, schrieb Philip Oakley:
> From: "Karsten Blees" <karsten.blees@xxxxxxxxx>
>> Am 15.10.2013 00:29, schrieb Felipe Contreras:
>>> tl;dr: everyone except Junio C Hamano and Drew Northup agrees; we
>>> should move
>>> away from the name "the index".
>>>
>>> It has been discussed many times in the past that 'index' is not an
>>> appropriate description for what the high-level user does with it,
>>> and
>>> it has been agreed that 'staging area' is the best term.
>>>
>>
>> I haven't followed the previous discussion, but if a final conclusion
>> towards 'staging area' has already been reached, it should probably be
>> revised.
> 
> Do you mean that how that conclusion was reached should be summarised,
> or that you don't think it's an appropriate summary of the broader
> weltanschauung?
> 

The latter. I don't know about 'broader', but I'll try to summarize _my_ world view:

(1) Audience matters

For actual users, we need an accurate model that supports a variety of use cases without falling apart. IMO, a working model is more important than simplicity. Finally, its more important to agree on the actual model than on a vague term that can mean many things (theater stage vs. loading dock...).

For potential users / decision makers, we need to describe git's features in unmistakable terms that don't need extra explanation. In this sense, the index / cache / staging area is not a feature in itself but facilitates a variaty of broader features:
- fine grained commit control (via index (add -i), but also commit -p, commit --amend, cherry-pick, rebase etc.)
- performance
- merging

(2) Index

An index, as in a library, maps almost perfectly to what the git index is _and_ what we do with it. No, I don't mean .so/.dll/.lib files, I'm talking about the real thing with shelves of books and a big box with index cards (aka the index).

The defining characteristic of a book (or publication in general) is its content, not its physical representation (paper). There are typically many indistinguishable copies of the same book. An author can continue working on the manuscript without affecting the copy at the library at all.

When a new or updated publication is submitted to the library, it is first added to the index and placed on a cart at the reception desk. Some time later, the librarian commits the content of the cart to the shelves. A user of the library will typically consult the index to lookup information or to check if his personal copy of a publication is up to date. The index can be thrown away and rebuilt from the content of the shelves. A big library may have a central repository and several local branches (aka field offices) that can be synchronized by comparing their indexes card by card.

Granted, a library is typically not versioned, and its unlikely that any one user will have checked out a full copy of the library's content. But otherwise, its pretty similar to git...

(3a) Staging area (logistics)

A staging area, as in (military) logistics / transportation, is about moving physical goods around. You move an item from your stock to the staging area, then onto the truck and finally deliver it to the customer.

The defining characteristic of a physical good is its physical existence. Each item is uniquely identifiable by a serial number. There may be many of the same kind, but there are no exact copies.

Problem #1: If an item in the staging area is broken, you fix it directly in the staging area, because that's where it _is_. Thus you also don't need to stage the item again. That's how conventional SCMs work: they track the identity (serial number, file name) of things.

Problem #2: The transportation model only supports additions. You cannot add an item to your staging area that, upon delivery, will magically remove itself from the possession of the customer. Let alone that you'd have to steal it first to be able to physically place it into your staging area.

This can be fixed by slightly modifying our mental model: instead of real things, lets think about "staging changes" (or deltas, or patches). Again, that's what conventional SCMs do and what git exactly does _not_ do.

Problem #3: In logistics, the state / inventory of the customer is irrelevant. If a customer orders an item he already has, its his problem. There's no need for core commands like status, diff or reset, and there's no way to explain what they do with a staging area model. What if a customer buys at another shop without telling us, effectively changing his inventory (git reset --soft)? This shouldn't affect our staging area at all, right? But with git it does...ooops.

(3b) Staging area (other meanings)

I don't see how a stage (as in a theater) is in any way related to the git index.

Data staging (as in loading a datawarehouse or web-server) fits to some extent, as its also about copying information, not moving physical things.

[...]
>>
>> 1.) Recording individual files to commit in advance (instead of
>> specifying them at commit time). Which isn't that hard to grasp.
> 
> For many, that separation of preparation(s), from the final action, is
> brand new and difficult to appreciate - it's special to computer systems
> (where copying is 100% reliable, essentially instantaneous, and in Git's
> case, 100% verifiable via crypto checksums).
> 

I'll try to remember that next time I write a shopping list... :-)

[...]
> Even 'native' speakers don't have a single consistent term for the
> concept. Terms are stolen from many varied industries and activities
> that have to prepare and package items (Ships, Trains, Theaters)
> (see http://en.wikipedia.org/wiki/Shipping_list, for a shortish list, which doesn't mention an Index)

All true, but we don't need to steal terms from unrelated fields if information science provides us with the terms we need.

[...]
> 
> In one sense even that is not the right term - If compared to a book /
> pamphlet / monograph (being placed in a Library / repository)  it's more of a contents list (by chapter and verse / directory and file), with various bits of front matter such as author, publisher, previous editions, introductory preface, dates, contents list, and finally content. A book's 'index' is a supplementary mini grep of useful terms that the reader may wish to find.
> 

Yes, a book's index is not the right meaning, as is stock market index or index finger. However, a library index seems to fit quite well.

By the same logic, I could argue that a file in git is not used as a tool to shape metal, therefore its not a file. Lets call it "costume", because a costume in a theater wraps an actor just like a file wraps content.</irony>

> All in all it's difficult to undo this Gordian knot of confusions.
> 
>>
>> Just my 2 cents
>> Karsten
>>
> The key is probably to separate the devlopers concerns over implementation details from the user's big picture view, in an arena that is short of well (commonly) understood terms.
> 

Yes, see my point about audience. Its probably also helpful to distinguish between unbiased SCM newbies and "braindamaged" VSS/CVS/SVN folks like me :-)

> Philip
> 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html