On Mon, Jul 24, 2023 at 8:28 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Christian Couder <christian.couder@xxxxxxxxx> writes: > > >> Minor nit. Aren't the above two the same use case? > > > > In the first case only some large blobs are moved to slower storage > > and in the other case all the blobs are moved to slower storage. So > > yeah the use cases are very similar. Not sure if and how I can improve > > the above wording though. > > Just by removing one or the other, it would be quite improved, no? > Moving away some blobs could move away all or just a selected > subset. Ok, I have done that in my current version. > > I think it can work if the call to write_filtered_pack() is either > > before the call to close_object_store() or after it. It would just use > > the tempfiles with their temporary name in the first case and with > > their final name in the second case. > > > > write_filtered_pack() is very similar to write_cruft_pack() which is > > called before the call to close_object_store(), so I prefer to keep it > > before that call too, if possible, for consistency. > > As long as the set-up is not racy, either would be OK, as the names > are not recorded in the end result. > > If the upstream tells the downstream the temporary's name and then > finializes the temporary to the final name before the downstream > reacts to the input, It doesn't seem to me that it's what happens. We have the following order: - finish_pack_objects_cmd() is called for the first pack-objects process. It populates the 'names' string_list with the temporary name of the packfile it generated (which doesn't contain the filtered out objects) and calls finish_command() to finish the first pack-objects process. So as far as I understand nothing can be written anymore to the packfile when finish_pack_objects_cmd() returns. - write_filtered_pack() is called. It starts the second pack-objects process and passes it the temporary name of the packfile that was just written, taking it from the 'names' string_list. It then calls finish_pack_objects_cmd() for the second process which populates the 'names' string_list with the temporary name of the packfile created by the second process and finishes the second process. So nothing can then be written in the second packfile anymore. - close_object_store() is called which renames the packfiles from the 'names' string_list giving them their final name. So the final names are given only once both processes are finished and both packfiles have been fully written. > however, then by the time downstream starts > working on the file, the file may not exist under its original, > temporary name, and that kind of race was what I was worried about. > > > Perhaps this could be dealt with separately though, as I think we > > might want to fix write_cruft_pack() first then. > > Sorry, I am not understanding this quite well. Do you mean we > should add one more known-to-be-racy-and-incorrect codepath because > there is already a codepath that needs to be fixed anyway? No. > If write_cruft_pack() has a similar issue, then yeah, let's fix that > first (testing might be tricky for any racy bugs). And let's use > the same technique as used to fix it in this series, too, so that we > do not reintroduce a similar bug due to racy setup. Yeah, that's what I mean. I am not sure the race actually exists though. I have tried to explain why it seems to me that things look correct, but from previous experience I know that you are very often right, and I might have missed something. Thanks.