On Mon, Aug 21, 2017 at 01:51:34PM +0200, Michael Haggerty wrote: > The philosophy of reference locking has been, "if another process is > changing a reference, then whatever I'm trying to do to it will > probably fail anyway because my old-SHA-1 value is probably no longer > current". But this argument falls down if the other process has locked > the reference to do something that doesn't actually change the value > of the reference, such as `pack-refs` or `reflog expire`. There > actually *is* a decent chance that a planned reference update will > still be able to go through after the other process has released the > lock. > > So when trying to lock an individual reference (e.g., when creating > "refs/heads/master.lock"), if it is already locked, then retry the > lock acquisition for approximately 100 ms before giving up. This > should eliminate some unnecessary lock conflicts without wasting a lot > of time. It will probably comes as little surprise to others on the list that I agree with the intent of this patch. This version is cleaned up a little from what we're running at GitHub right now, so I'll try to review with fresh eyes. > diff --git a/Documentation/config.txt b/Documentation/config.txt > index d5c9c4cab6..2c04b9dfb4 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -776,6 +776,12 @@ core.commentChar:: > If set to "auto", `git-commit` would select a character that is not > the beginning character of any line in existing commit messages. > > +core.filesRefLockTimeout:: > + The length of time, in milliseconds, to retry when trying to > + lock an individual reference. Value 0 means not to retry at > + all; -1 means to try indefinitely. Default is 100 (i.e., > + retry for 100ms). > + > core.packedRefsTimeout:: > The length of time, in milliseconds, to retry when trying to > lock the `packed-refs` file. Value 0 means not to retry at Do we need a separate config from packedRefsTimeout that is in the context? I guess so, since the default values are different. And rightfully so, I think, since writing a packed-refs file is potentially a much larger operation (being O(n) in the number of refs rather than a constant 40 bytes). It probably doesn't matter all that much either way, as I wouldn't expect people to need to tweak either of these in practice. At some point when we have another ref backend that needs to take a global lock, we'd probably have a third timeout. If we were starting from scratch, I'd suggest that these might be part of refstorage.files.* instead of core.*. And then eventually we might have refstorage.reftable.timeout, refstorage.lmdb.timeout, etc. But since core.packedRefsTimeout has already sailed, I'm not sure it's worth caring about. > diff --git a/refs.c b/refs.c > index fe4c59aa8b..29dbb9b610 100644 > --- a/refs.c > +++ b/refs.c > @@ -561,6 +561,21 @@ enum ref_type ref_type(const char *refname) > return REF_TYPE_NORMAL; > } > > +long get_files_ref_lock_timeout_ms(void) > +{ > + static int configured = 0; > + > + /* The default timeout is 100 ms: */ > + static int timeout_ms = 100; > + > + if (!configured) { > + git_config_get_int("core.filesreflocktimeout", &timeout_ms); > + configured = 1; > + } > + > + return timeout_ms; > +} This reads the config value into a static local that gets cached for the rest of the program run, and there's no way to invalidate the cache. I think that should be OK, as we would never hit the ref code until we've actually initialized the repository, at which point we're fairly well committed. (I'm a little gun-shy because I used a similar lazy-load pattern for core.sharedrepository a while back, and those corner cases bit us. But there it was heavily used by git-init, which wants to "switch" repos after having loaded some values). > @@ -573,7 +588,9 @@ static int write_pseudoref(const char *pseudoref, const unsigned char *sha1, > strbuf_addf(&buf, "%s\n", sha1_to_hex(sha1)); > > filename = git_path("%s", pseudoref); > - fd = hold_lock_file_for_update(&lock, filename, LOCK_DIE_ON_ERROR); > + fd = hold_lock_file_for_update_timeout(&lock, filename, > + LOCK_DIE_ON_ERROR, > + get_files_ref_lock_timeout_ms()); > if (fd < 0) { > strbuf_addf(err, "could not open '%s' for writing: %s", > filename, strerror(errno)); The rest of the patch looks obviously correct to me. The one thing I didn't audit for was whether there were any calls that _should_ be changed that you missed. But I'm pretty sure based on our previous off-list discussions that you just did that audit. Obviously new ref-locking calls would want to use the timeout variant here, too, and we need to remember to do so. But I don't think there should be many, as most sites should be going through create_reflock() or lock_raw_ref(), which you touch here. -Peff