Re: [PATCH] doc/reftable: document how to handle windows

Han-Wen Nienhuys <hanwen@xxxxxxxxxx> · Tue, 26 Jan 2021 12:38:38 +0100

On Tue, Jan 26, 2021 at 6:49 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> >  $ cat .git/reftable/tables.list
> > -00000001-00000001.log
> > -00000002-00000002.ref
> > -00000003-00000003.ref
> > +00000001-00000001-RANDOM1.log
> > +00000002-00000002-RANDOM2.ref
> > +00000003-00000003-RANDOM3.ref
> >  ....
> > @@ -940,7 +944,7 @@ new reftable and atomically appending it to the stack:
> >  3.  Select `update_index` to be most recent file's
> >  `max_update_index + 1`.
> >  4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
> > -5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
> > +5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}-${random}.ref`.
> >  6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
> >  7.  Rename `tables.list.lock` to `tables.list`.
>
> Is this because we have been assuming that in step 5. we can
> "overwrite" (i.e. take over the name, implicitly unlinking the
> existing one) the existing 0000001-00000001.ref with the newly
> prepared one, which is not doable on Windows?

No, the protocol for adding a table to the end of the stack is
impervious to problems on Windows, as everything happens under lock,
so there is no possibility of collisions.

> We must prepare for two "randoms" colliding and retrying the
> renaming step anyway, so would it make more sense to instead
> use a non-random suffix (i.e. try "-0.ref" first, and when it
> fails, readdir for 0000001-00000001-*.ref to find the latest
> suffix and increment it)?

This is a lot of complexity, and both transactions and compactions can
always fail because they fail to get the lock, or because the data to
be written is out of date. So callers need to be prepared for a retry
anyway.

> > @@ -993,7 +997,7 @@ prevents other processes from trying to compact these files.
> >  should always be the case, assuming that other processes are adhering to
> >  the locking protocol.
> >  7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
> > -`${min_update_index}-${max_update_index}.ref`.
> > +`${min_update_index}-${max_update_index}-${random}.ref`.
> >  8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
> >  with the file from (4).
>
> Likewise.

This case is different. Consider the following situation

1-1.ref:
  main=abc123 @ timestamp 1
  master=abc123 @ timestamp 1
2-2.ref:  bla=456def @ timestamp 2
3-3.ref:
  bla delete @ timestamp 3
  master delete @timestamp 3

The result of compacting this together would be a table containing

  main = abc123 @ timestamp 1

but in the previous naming convention, we'd name the resulting table
"1-1.ref", which conflicts with the table in our starting situation.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado