On Thu, Dec 09, 2021 at 08:11:34AM +0100, Patrick Steinhardt wrote: > > For the files-backend code, I think system errors would naturally fall > > out in the same code. Failing to write() or rename() is not much > > different than failing to get the lock in the first place. So > > "partial-update" and "non-atomic" behavior would end up the same anyway, > > and we do not need to expose the third mode to the user. > > I think I disagree here. Failing to write() to me is quite different > from failing to take a lock: the first one is an unexpected system-level > error and brings us into a situation where we ain't got no clue why it > happened. The second one is a logical error that is entirely expected > given that lockfiles are explicitly designed for this failure mode, so > we know why they happen. With this in mind, I'd argue that we should > only continue with the transaction in the latter case, and abort on > unexpected system-level errors. Just to be clear, it's not that I necessarily think these error types are logically related. I only meant that once you are detecting and recovering from one type, it would be easy to implement it either way. I'd be OK with either type of behavior. If that was the only difference between partial-update and non-atomic, though, I'm not sure if that merits exposing the complexity of a "third mode" to the user. But I don't feel strongly about it either way. > > I suspect the surgery needed for the ref-transaction code to allow > > non-atomic updates would be pretty big, though. It involves checking > > every error case to make sure it is safe to continue rather than > > aborting (and munging data structures to mark particular refs as > > "failed, don't do anything further for this one"). > > I hope that it's not going to be that bad if we restrict it to the > "prepare" phase, but that may just be wishful thinking. Yeah, maybe. :) I didn't look closely, so it may not be too bad. I just remember the refs system being very finicky about things like failure, races, etc. But I'm sure you'll figure it out once you start looking closely. :) One thing to watch out for is that in the files backend, _part_ of the update may be shared by multiple refs: updating the packed-refs file (if we are deleting refs). So if you are deleting "refs/heads/foo" and "refs/heads/bar", but taking the lock on "foo" fails, you'd want to make sure only to delete "bar" from packed-refs, not "foo". -Peff