Re: [PATCH 3/4] refs.c: add a transaction function to append a reflog entry

Stefan Beller <sbeller@xxxxxxxxxx> · Wed, 3 Dec 2014 14:28:52 -0800

On Tue, Dec 02, 2014 at 07:15:19PM -0800, Jonathan Nieder wrote:

I'll think about rewriting the commit message, so it is easier to digest.
I'll follow your suggestions except for the following annotations

> 
> > +int transaction_update_reflog(struct transaction *transaction,
> > +			      const char *refname,
> > +			      const unsigned char *new_sha1,
> > +			      const unsigned char *old_sha1,
> > +			      const char *email,
> > +			      unsigned long timestamp, int tz,
> > +			      const char *msg, int flags,
> > +			      struct strbuf *err)
> 
> This is an intimidating list of arguments.  Would it make sense to
> pack them into a struct, or to make the list less intimidating
> some other way (e.g. combining email + timestamp + tz into an
> ident string)?

It's true, that is's a huge list. One of the reasons Ronnie gave, 
was having symmetry in reflog.c:expire_reflog_ent, which has a very
similar signature, the  email + timestamp + tz are passed in as separate
arguments.

expire_reflog_ent is being called from within reflog.c:expire_reflog

	if (for_each_reflog_ent(ref, expire_reflog_ent, &cb)) {
		status |= error("%s", err.buf);
		goto cleanup;
	}

now if, we dive into the for_each_reflog_ent function, it's essentially 
just calling show_one_reflog_ent on each reflog entry.
In there we are trying to separate a reflogs entry into the fields
(new sha1, old sha1, committer, email, time, timezone, message) by the
lovely expression

	/* old SP new SP name <email> SP time TAB msg LF */
	if (sb->len < 83 || sb->buf[sb->len - 1] != '\n' ||
	    get_sha1_hex(sb->buf, osha1) || sb->buf[40] != ' ' ||
	    get_sha1_hex(sb->buf + 41, nsha1) || sb->buf[81] != ' ' ||
	    !(email_end = strchr(sb->buf + 82, '>')) ||
	    email_end[1] != ' ' ||
	    !(timestamp = strtoul(email_end + 2, &message, 10)) ||
	    !message || message[0] != ' ' ||
	    (message[1] != '+' && message[1] != '-') ||
	    !isdigit(message[2]) || !isdigit(message[3]) ||
	    !isdigit(message[4]) || !isdigit(message[5]))
		return 0; /* corrupt? */

so I wonder if it actually makes sense to first separate it out into fields,
and then joining to an ident string again. Is there some kind of struct already,
which groups together identifying information, such as name, email, timestamp + zone,
which could be reused here for passing on the data?

Anyway, this is a first step on transactioning the reflog code, so it is minimally
invasive, not changing a lot of business logic. If you look into the next patch,
(reflog.c: use a reflog transaction when writing during expire)
you'll see that it's essentially just a replacement of printfs to calling
the function.

-	if (cb->newlog) {
-		fprintf(cb->newlog, "%s %s %s %lu %c%04d\t%s",
-			sha1_to_hex(osha1), sha1_to_hex(nsha1),
-			email, timestamp, sign, zone,
-			message);
+	if (cb->t) {
+		if (transaction_update_reflog(cb->t, cb->refname, nsha1, osha1,
+					      email, timestamp, tz, message, 0,
+					      cb->err))

and the errorhandling was removed.
If we want to change the function signature of transaction_update_reflog, 
I'd do it in a follow up?

> 

> 
> [...]
> > +	if (flags & REFLOG_TRUNCATE) {
> > +		if (lseek(fd, 0, SEEK_SET) < 0 ||
> > +			ftruncate(fd, 0)) {
> > +			strbuf_addf(err, "Could not truncate reflog: %s. %s",
> > +				    refname, strerror(errno));
> 
> Odd error message format (using '.' to separate the refname from
> strerror(errno) is unusual).  Errors normally are supposed to start
> with a lowercase letter, like
> 
> 	cannot truncate reflog '%s': %s
> 
> > +			goto failure;
> > +		}
> > +	}
> 
> How does this cause the reflog to be populated in the !(flags &
> REFLOG_TRUNCATE) case?
> 
> Maybe I am misunderstanding the API.  If I use
> transaction_update_reflog() and have not updated the reflog
> previously, isn't this supposed to just append a new entry to the
> reflog?
> 

He, that's indeed a good catch. I was investigating the API again myself and
the REFLOG_TRUNCATE flag is only set on the very first call to transaction_update_reflog
and the subsequent calls are rebuilding the reflog by adding the unpruned lines 
again line by line.

I am wondering if we should actually put that into the same function now.
As the new reflog is build on the calls by transaction_update_reflog, it's actually
depending on the order strictly as opposed to the refs.
Maybe we want to have two separate functions
	transaction_update_begin_reflog(...)
	transaction_update_reflog_add_line(...)

instead? Then a reflog update could look like this:
	t=transaction_begin(...)

	transaction_update_begin_reflog(t, refname)
	for_each_reflog_entry_for_refname(entry, refname):
		if (want_to_keep(entry):
			transaction_update_reflog_add_line(entry, refname)

	transaction_commit(...)

The logic of want_to_keep is currently in reflog.c:expire_reflog_ent(...),
not sure if there are other places.

> [...]
> > +failure:
> > +	strbuf_release(&buf);
> > +	/*
> > +	 * As we are using the lock file API, any stale files left behind will
> > +	 * be taken care of, no need to do anything here.
> > +	 */
> 
> That's only true if the caller is going to exit instead of proceeding
> to other work.
> 
> With current callers, I assume that's true.  So should this comment
> say something like "No need to roll back stale lock files because
> the caller will exit soon"?  Or should this roll back the lockfile
> anyway, in case the caller wants to try again?

I am not sure if we ever want transactions be tried again without the user 
explicitely rerunning the command?

As we're operating on a lockfile, which was just created by us, and now in 
the failure command, we're likely to die? Maybe I remove the comment altogether, 
as the wondering reader will look into the API for lock files, when looking 
for problems here.

> > +	/* Commit all reflog updates*/
> > +	for_each_string_list_item(item, &transaction->reflog_updates) {
> > +		struct lock_file *lock = item->util;
> > +		commit_lock_file_to(lock, git_path("logs/%s", item->string));
> 
> Neat.
> 
> This seems like a good starting point.  Other code paths still write
> to reflogs directly --- is there a plan to get them to use this code,
> too, in a followup patch (e.g., by making write_ref_sha1() or
> log_ref_write() use its own small transaction for the reflog update)?
> 

That's the plan, but not part of this patch series yet.

> Thanks and hope that helps,
> Jonathan

Thanks for the comments,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html