On Mon, Mar 18, 2024 at 11:14:58PM +0100, Han-Wen Nienhuys wrote: > I had a quick look over the reftable bits of this series. It looks OK, > but here are some comments. Nothing blocking. > > * reftable/error: discern locked/outdated errors > > It is not obvious to me why you need two different codes. Is it so you > can print the offending lock file (so people can delete them > manually?). FWIW, this was based on JGit, which has > > /** > * The ref could not be locked for update/delete. > * <p> > * This is generally a transient failure and is > usually caused by > * another process trying to access the ref at the > same time as this > * process was trying to update it. It is possible a > future operation > * will be successful. > */ I just think that these are somewhat different failure modes. Not being able to lock a file because it already is locked is totally different than realizing that files have been concurrently modified and are thus out of date now. So the main motivator why I split up the error codes is that the error message will end up being shown to the user. Before we would say "data is outdated" for both cases. Now we either say "data is locked", or we say "data concurrently modified", which I think is a lot more helpful to the user. > * reftable/stack: gracefully handle failed auto-compaction due to locks > > It's a bit unsatisfying that you have to use details of the locking > protocol to test it, but I couldn't think of a way to unittest this > using only the API. Maybe it's worth considering removing the > automatic compaction from the reftable-stack.h API, and have the > caller (eg. in refs/reftable-backend.c) call it explicitly? I think it's not all that bad. Yes, we now need to know about error codes returned by the auto-compaction code. But it's still contained in the reftable library. If we were to remove it from the API itself then this knowledge would need to exist in all callers of the auto compaction code which live _outside_ of the library, which I think would be even worse. We probably could add a unit test that basically does the same as the added test in t0610. That is, we lock all tables in the stack and then check that `reftable_addition_commit()` fails gracefully when trying to compact the already-locked tables. But that actually highlights another issue that we will tackle soonish: I think that the compaction code should handle already-locked files more gracefully. E.g. when tables 0..N are locked by a concurrent process, then the reftable library should realize that and only try to compact tables N+1..M. Justin Tobler is currently working on some refactorings of the auto-compaction logic in [1], so we will tackle this issue once a revised version of his patch has landed. [1]: https://lore.kernel.org/git/pull.1683.git.1709669025722.gitgitgadget@xxxxxxxxx/ Patrick