Re: [PATCH v2 5/8] repack: add `--filter=<filter-spec>` option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 24, 2023 at 8:28 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> >> Minor nit.  Aren't the above two the same use case?
> >
> > In the first case only some large blobs are moved to slower storage
> > and in the other case all the blobs are moved to slower storage. So
> > yeah the use cases are very similar. Not sure if and how I can improve
> > the above wording though.
>
> Just by removing one or the other, it would be quite improved, no?
> Moving away some blobs could move away all or just a selected
> subset.

Ok, I have done that in my current version.

> > I think it can work if the call to write_filtered_pack() is either
> > before the call to close_object_store() or after it. It would just use
> > the tempfiles with their temporary name in the first case and with
> > their final name in the second case.
> >
> > write_filtered_pack() is very similar to write_cruft_pack() which is
> > called before the call to close_object_store(), so I prefer to keep it
> > before that call too, if possible, for consistency.
>
> As long as the set-up is not racy, either would be OK, as the names
> are not recorded in the end result.
>
> If the upstream tells the downstream the temporary's name and then
> finializes the temporary to the final name before the downstream
> reacts to the input,

It doesn't seem to me that it's what happens.

We have the following order:

  - finish_pack_objects_cmd() is called for the first pack-objects
process. It populates the 'names' string_list with the temporary name
of the packfile it generated (which doesn't contain the filtered out
objects) and calls finish_command() to finish the first pack-objects
process. So as far as I understand nothing can be written anymore to
the packfile when finish_pack_objects_cmd() returns.

  - write_filtered_pack() is called. It starts the second pack-objects
process and passes it the temporary name of the packfile that was just
written, taking it from the 'names' string_list. It then calls
finish_pack_objects_cmd() for the second process which populates the
'names' string_list with the temporary name of the packfile created by
the second process and finishes the second process. So nothing can
then be written in the second packfile anymore.

  - close_object_store() is called which renames the packfiles from
the 'names' string_list giving them their final name.

So the final names are given only once both processes are finished and
both packfiles have been fully written.

> however, then by the time downstream starts
> working on the file, the file may not exist under its original,
> temporary name, and that kind of race was what I was worried about.
>
> > Perhaps this could be dealt with separately though, as I think we
> > might want to fix write_cruft_pack() first then.
>
> Sorry, I am not understanding this quite well.  Do you mean we
> should add one more known-to-be-racy-and-incorrect codepath because
> there is already a codepath that needs to be fixed anyway?

No.

> If write_cruft_pack() has a similar issue, then yeah, let's fix that
> first (testing might be tricky for any racy bugs).  And let's use
> the same technique as used to fix it in this series, too, so that we
> do not reintroduce a similar bug due to racy setup.

Yeah, that's what I mean.

I am not sure the race actually exists though. I have tried to explain
why it seems to me that things look correct, but from previous
experience I know that you are very often right, and I might have
missed something.

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux