On 04/16/2014 09:31 PM, Junio C Hamano wrote: > Ronnie Sahlberg <sahlberg@xxxxxxxxxx> writes: > >> Currently any locking of refs in a transaction only happens during the commit >> phase. I think it would be useful to have a mechanism where you could >> optionally take out locks for the involved refs early during the transaction. >> So that simple callers could continue using >> ref_transaction_begin() >> ref_transaction_create|update|delete()* >> ref_transaction_commit() >> >> but, if a caller such as walker_fetch() could opt to do >> ref_transaction_begin() >> ref_transaction_lock_ref()* >> ...do stuff... >> ref_transaction_create|update|delete()* >> ref_transaction_commit() >> >> In this second case ref_transaction_commit() would only take out any locks that >> are missing during the 'lock the refs" loop. >> >> Suggestion 1: Add a ref_transaction_lock_ref() to allow locking a ref >> early during >> a transaction. > > Hmph. > > I am not sure if that is the right way to go, or instead change all > create/update/delete to take locks without adding a new primitive. Junio's suggestion seems like a good idea to me. Obviously, as soon as we take out a lock we could also do any applicable old_sha1 check and possibly fail fast. Does a "verify" operation require holding a lock? If not, when is the verification done--early, or during the commit, or both? (I realize that we don't yet *have* a verify operation at the API level, but we should!) We also need to think about for what period of time we have to hold the packed-refs lock. Finally, we shouldn't forget that currently the reflog files are locked by holding the lock on the corresponding loose reference file. Do we need to integrate these files into our system any more than they currently are? [By the way, I noticed the other day that the command git reflog expire --stale-fix --expire-unreachable=now --all can hold loose reference locks for a *long* time (like 10s of minutes), especially in weird cases like the repository having 9000 packfiles for some reason or another :-) The command execution time grows strongly with the length of the reference's log, or maybe even the square of the length assuming the history is roughly linear and reachability is computed separately for each SHA-1. This is just an empirical observation so far; I haven't looked into the code yet.] >> A second idea is to change the signatures for >> ref_transaction_create|update|delete() >> slightly and allow them to return errors early. >> We can check for things like add_update() failing, check that the >> ref-name looks sane, >> check some of the flags, like if has_old==true then old sha1 should >> not be NULL or 0{40}, etc. >> >> Additionally for robustness, if any of these functions detect an error >> we can flag this in the >> transaction structure and take action during ref_transaction_commit(). >> I.e. if a ref_transaction_update had a hard failure, do not allow >> ref_transaction_commit() >> to succeed. >> >> Suggestion 2: Change ref_transaction_create|delete|update() to return an int. >> All callers that use these functions should check the function for error. > > I think that is a very sensible thing to do. > > The details of determining "this cannot possibly succeed" may change > (for example, if we have them take locks at the point of > create/delete/update, a failure to lock may count as an early > error). > > Is there any reason why this should be conditional (i.e. you said > "allow them to", implying that the early failure is optional)? Also a good suggestion. We should make it clear in the documentation that the create/delete/update functions are not *obliged* to return an error (for example that the current value of the reference does not agree with old_sha1) because future alternate ref backends might possibly not be able to make such checks until the commit phase. For example, checking old_sha1 might involve a round-trip to a database or remote repository, in which case it might be preferable to check all values in a single round-trip upon commit. So, callers might be informed early of problems, or they might only learn about problems when they try to commit the transaction. They have to be able to handle either type of error reporting. So then the question arises (and maybe this is what Ronnie was getting at by suggesting that the checks might be conditional): are callers *obliged* to check the return values from create/delete/update, or are they allowed to just keep throwing everything into the transaction, ignoring errors, and only check the result of ref_transaction_commit()? I don't feel strongly one way or the other about this question. It might be nice to be able to write callers sloppily, but it would cost a bit more code in the reference backends. Though maybe it wouldn't even be much extra code, given that we would probably want to put consistency checks in there anyway. >> Suggestion 3: remove the qsort and check for duplicates in >> ref_transaction_commit() >> Since we are already taking out a lock for each ref we are updating >> during the transaction >> any duplicate refs will fail the second attempt to lock the same ref which will >> implicitly make sure that a transaction will not change the same ref twice. > > I do not know if I care about the implementation detail of "do we > have a unique set of update requests?". While I do not see a strong > need for one transaction to touch the same ref twice (e.g. create to > point at commit A and update it to point at commit B), I do not see > why we should forbid such a use in the future. I agree. For example, we might very well want to allow multiple updates to a single reference, as long as they are mutually consistent, to be coalesced into a single update. I also expect that the error message in the current code is more illuminating than the generic error message that would result from the inability to lock a reference (because it is already locked by an earlier update in the same transaction!) Let's leave this aspect the way it is now and revisit the topic later when we have learned more. I see, though, that this will be a little tricky to implement early locking. Currently we sort all of the updates and look for duplicates *before* we acquire any locks. But if we want to acquire locks immediately, then (since the list isn't sorted yet) it will be expensive to look for duplicates before attempting to lock. I can see two possibilities: 1. Use a different data structure, like a hash map, for storing updates, so that we have a cheap way to lookup whether there is already an update involving a reference. 2. Try to lock the reference, and if the lock fails *then* look back through the list to see if we're the ones holding the lock already. Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html