On Mon, Oct 09, 2023 at 12:54:12PM +0200, Patrick Steinhardt wrote: > In Gitaly, we usually set up quarantine directories for all operations > that create objects. This allows us to discard any newly written objects > in case either the RPC call gets cancelled or in case our access checks > determine that the change should not be allowed. The logic is rather > simple: > > 1. Create a new temporary directory. > > 2. Set up the new temporary directory as main object database via > the `GIT_OBJECT_DIRECTORY` environment variable. > > 3. Set up the main repository's object database via the > `GIT_ALTERNATE_OBJECT_DIRECTORIES` environment variable. Is there a reason not to run Git in the quarantine environment and list the main repository as an alternate via $GIT_DIR/objects/info/alternates instead of the GIT_ALTERNATE_OBJECT_DIRECTORIES environment variable? > 4. Execute Git commands that write objects with these environment > variables set up. The new objects will end up neatly contained in > the temporary directory. > > 5. Once done, either discard the temporary object database or > migrate objects into the main object daatabase. Interesting. I'm curious why you don't use the builtin tmp_objdir mechanism in Git itself. Do you need to run more than one command in the quarantine environment? If so, that makes sense that you'd want to have a scratch repository that lasts beyond the lifetime of a single process. > I wonder whether this would be a viable approach for you, as well. I think that the main problem that we are trying to solve with this series is creating a potentially large number of loose objects. I think that you could do something like what you propose above, with a 'git repacks -adk' before moving its objects over back to the main repository. But since we're working in a single process only when doing a merge-tree operation, I think it is probably more expedient to write the pack's bytes directly. > Patrick Thanks, Taylor