Re: [PATCH 02/15] refs.c: return error instead of dying when locking fails during transaction

Ronnie Sahlberg <sahlberg@xxxxxxxxxx> · Tue, 11 Nov 2014 07:42:53 -0800

On Tue, Nov 11, 2014 at 2:34 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Tue, Oct 21, 2014 at 01:36:47PM -0700, Ronnie Sahlberg wrote:
>
>> commit e193c10fc4f9274d1e751cfcdcc4507818e8d498 upstream.
>>
>> Change lock_ref_sha1_basic to return an error instead of dying when
>> we fail to lock a file during a transaction.
>> This function is only called from transaction_commit() and it knows how
>> to handle these failures.
>> [...]
>> -             else
>> -                     unable_to_lock_die(ref_file, errno);
>> +             else {
>> +                     struct strbuf err = STRBUF_INIT;
>> +                     unable_to_lock_message(ref_file, errno, &err);
>> +                     error("%s", err.buf);
>> +                     strbuf_reset(&err);
>> +                     goto error_return;
>> +             }
>
> I coincidentally just wrote almost the identical patch, because this
> isn't just a cleanup; it fixes a real bug. During pack_refs, we call
> prune_ref to lock and delete the loose ref. If the lock fails, that's
> OK; that just means somebody else is updating it at the same time, and
> we can skip our pruning step. But due to the unable_to_lock_die call
> here in lock_ref_sha1_basic, the pack-refs process may die prematurely.
>
> I wonder if it is worth pulling this one out from the rest of the
> series, as it has value (and can be applied) on its own. I did some
> digging on the history of this, too. Here's the rationale I wrote:
>
>
>     lock_ref_sha1_basic: do not die on locking errors
>
>     lock_ref_sha1_basic is inconsistent about when it calls
>     die() and when it returns NULL to signal an error. This is
>     annoying to any callers that want to recover from a locking
>     error.
>
>     This seems to be mostly historical accident. It was added in
>     4bd18c4 (Improve abstraction of ref lock/write.,
>     2006-05-17), which returned an error in all cases except
>     calling safe_create_leading_directories, in which case it
>     died.  Later, 40aaae8 (Better error message when we are
>     unable to lock the index file, 2006-08-12) asked
>     hold_lock_file_for_update to die for us, leaving the
>     resolve_ref code-path the only one which returned NULL.
>
>     We tried to correct that in 5cc3cef (lock_ref_sha1(): do not
>     sometimes error() and sometimes die()., 2006-09-30),
>     by converting all of the die() calls into returns. But we
>     missed the "die" flag passed to the lock code, leaving us
>     inconsistent. This state persisted until e5c223e
>     (lock_ref_sha1_basic(): if locking fails with ENOENT, retry,
>     2014-01-18). Because of its retry scheme, it does not ask
>     the lock code to die, but instead manually dies with
>     unable_to_lock_die().
>
>     We can make this consistent with the other return paths by
>     converting this to use unable_to_lock_message(), and
>     returning NULL. This is safe to do because all callers
>     already needed to check the return value of the function,
>     since it could fail (and return NULL) for other reasons.
>
> I also have some other cleanups to lock_ref_sha1_basic's error handling.
> I'd be happy to take over this patch and send it along with those
> cleanups as a separate series.

 Sounds Good To Me.

Thanks,
Ronnie Sahlberg
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html