Re: suspected race between packing and fetch (single case study)

yoh@xxxxxxxxxxxxxx · Wed, 13 Jan 2021 09:55:45 -0500

On Tue, 12 Jan 2021, Taylor Blau wrote:
> > > ++
> > > +*NOTE*: this operation can race with concurrent modification to the
> > > +source repository, similar to running `cp -r src dst` while modifying
> > > +`src`.

> > Couldn't `gc` be triggered by git in seemingly read-only operations,
> > thus possibly ruining the analogy with `cp` while doing `rm` (explicit
> > intent to modify)?

> > Moreover, situation is also a bit different since a sane user script
> > would not place `rm` into background to keep operating on original
> > source right before doing `cp` -- and that is what is happening here:

> If you're suggesting that something is missing from the above patch, I'm
> not sure I quite understand what you would like added.

Slept on it.  I think your patch (doc disclaimer) is factually correct
and probably as good as it can get.  Not yet sure if it is worth
explicit mentioning `gc` or `repack` as one of such concurrent
operations.

> All of these (background gc, explicit rm-ing) fall under the category of
> "concurrent modification": they are changing the source directory in
> some way while a read operation is taking place.

yes.  My comment was more on how such modifications are triggered: via
explicit actions (e.g. `rm`) intended to modify vs as a "house
keeping running in the background", which is the case of gc in
particular when triggered by seemingly read-only operations.

> > `git` operation is presumably complete (but leaves `gc` running in the
> > background) and script advances to the next step only to run into a race
> > condition with that preceding `git` command which apparently triggered
> > `gc`.  Should then any script which operates on local `git` repositories
> > not to forget to add   -c gc.autodetach=0  for every git
> > invocation which might be potentially effected?

> If your workflow is that you are frequently cloning via the local
> transport and there is no other synchronization going on between
> whatever work is happening in the source repository, then yes. (But note
> of course that you can set gc.autodetach=0 via the source repository's
> .git/config rather than typing it each time).

IMHO it affects efficiency, become cumbersome (for git users), and thus
might be error-prone: e.g.  gc.autodetach=0 is necessity only to
mitigate only for a possible subsequent `clone` invocation operating
locally.  Higher level constructs siting on top of `git` would not know
what is the next command ran in the user script (like in our case of
datalad) to set such config variable for their invocations.  Adding
gc.autodetach=0 to every single `git` invocation would effect our
efficiency. User might not be made aware of such necessity for using
`git clone` on local repositories, only after having their scripts
deployed and at some random points in time start hitting the race
condition and go "google" and RTFM mode to figure out what is
going on.

That is why I am more in-line with your initial comment  in
https://lore.kernel.org/git/X%2FipCPFyW3gAWrHo@nand.local/ :

> Perhaps Git could take some sort of lock when writing to the object
> store, but an flock wouldn't work since we'd want to allow multiple
> readers to acquire the lock simultaneously, so long as there is no
> writer.

I think it would be nice to have `clone_local()` first check that
there is no ongoing modifications happening  before proceeding and wait
some reasonable amount of time (up to ?0 sec?) if still ongoing, and
then fail "informatively" if still cannot clone.  Even though it would
not prevent race condition in full (`clone_local` might check and
initiate, and then some process starts altering while `clone_local` is
ongoing), it would mitigate any scripted cases of a local `git clone`
following some heavy manipulations of original repository which triggers
background gc.

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
WWW:   http://www.linkedin.com/in/yarik