Re: [PATCH v8 4/4] cache-tree: Write updated cache-tree after commit

Duy Nguyen <pclouds@xxxxxxxxx> · Wed, 16 Jul 2014 17:18:31 +0700

On Tue, Jul 15, 2014 at 11:45 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> What is the real point of "writing into *.lock and renaming"?  It
> serves two purposes: (1) everybody adheres to that convention---if
> we managed to take the lock "index.lock", nobody else will compete
> and interfere with us until we rename it to "index". (2) in the
> meantime, others can still read the old contents in the original
> "index".
>
> With "take lock", "write to a temporary", "commit by rename or
> rollback by delete", we can have the usual transactional semantics.
> While we are working on it, nobody is allowed to muck with the same
> file, because everybody pays attention to *.lock.  People will not
> see what we are doing until we release the lock because we are not
> writing into the primary file.  And people can see what we did in
> its entirety once we are done because we close and rename to commit
> our changes atomically.

True.

> Think what CLOSE_LOCK adds to that and you would appreciate its
> usefulness and at the same time realize its limitation.  By allowing
> us to flush what we wrote to the disk without releasing the lock, we
> can give others (e.g. subprograms we spawn) controlled access to the
> new contents we prepared before we commit the result to the outside
> world.  The access is controlled because we are in control when we
> tell these others to peek into or update the temporary file "*.lock".
>
> The original implementaiton of CLOSE_LOCK is limited in that you can
> do so only once; you take a lock, you do your thing, you close, you
> let (one or more) others see it, and you commit (or rollback).  You
> cannot do another of your thing once you close with the original
> implementation because there was no way to reopen.

This is probably where our opinions differ. Yes, if you are sure
nobody else is looking at the lock file any more, then you can do
whatever you want. And because this is a .lock file, nobody is
supposed to look at it unless you tell them too (in contrast
$GIT_DIR/index can be read at any time). The format of the index makes
it impossible to just edit one byte and be done with it. You always
write a full new file. By sticking to transaction-style update, you
need no extra code, and you have a back up file as a side effect.

> What do you gain by your proposal to lock "index.lock" file?  We
> know we already have "index.lock", so nobody should be competing on
> mucking with its contents with us and we gain nothing by writing
> into index.lock.lock and renaming it to index.lock.  We are in total
> control of the lifetime of index.lock, when we spawn subprograms on
> it to let them write to it, when we ourselves write to it, when we
> spawn subprograms on it to let them read from it, all under the lock
> we have on the "index", i.e. "index.lock".
>
> The only thing use of another temporary file (that does not have to
> be a lock on "index.lock", by the way, because we have total control
> over when and who updates the file while we hold the "index.lock")
> would give you is that it allows you to make the success of the "do
> another of your thing" step optional.  While holding the lock, you
> close and let "add -i" work on it, and after it returns, instead of
> reopening, you write into yet another "index.lock.lock", expecting
> that it might fail and when it fails you can roll it back, leaving
> the contents "add -i" left in "index.lock" intact.  If you do not
> use the extra temporary file, you start from "index.lock" left by
> "add -i", write the updated index into "index.lock" and if you fail
> to write, you have to roll back the entire "index"---you lose the
> option to use the index left by "add -i" without repopulated
> cache-tree.  But in the index update context, I do not think such a
> complexity is not necessary.  If something fails, we should fail and
> roll back the entire "index".

I probably look at the problem from a wrong angle. To me the result of
"commit -p" is precious. I'm not a big user of "commit -p" myself as I
prefer "add -p" but it's the same: it'd be frustrating if after you
have carefully added your chunks, the program aborts and you have to
start over. And not just a few chunks. Think of reviewing .po files
and approve strings by the means of adding them to the index. Perhaps
because _I_ as a developer see this cache-tree update step optional
and react to it unnecessarily. Ordinary users won't see any
difference. And perhaps a better way to save the result of "commit/add
-p" is some sort of index log, not be over-protective at this
"interactive commit" code block.

I don't feel strongly either way. So your call.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html