Patrick Steinhardt <ps@xxxxxx> writes: >> It would be much more preferrable to see this series go like so: >> >> [1/4] create append_fetch_head() that writes out to >> fetch_head->fp >> >> [1.5/4] convert append_fetch_head() to ALWAYS accumulate in >> fetch_head->buf, and ALWAYS flushes what is accumulated >> at the end. > > This is a change I'm hesitant to make. The thing is that FETCH_HEAD is > quite weird as the current design allows different `git fetch --append` > processes to write to FETCH_HEAD at the same time. Hmph, do you mean git fetch --append remote-a & git fetch --append remote-b or something else [*1*]? With the current write-out codepath, there is no guarantee that ... > > + fprintf(fetch_head->fp, "%s\t%s\t%s", > > + old_oid, merge_status_marker, note); > > + for (i = 0; i < url_len; ++i) > > + if ('\n' == url[i]) > > + fputs("\\n", fetch_head->fp); > > + else > > + fputc(url[i], fetch_head->fp); > > + fputc('\n', fetch_head->fp); ... these stdio operations for a single record would result in a single atomic write() that would not compete with writes by other processes. So I wouldn't call "the current design allows ... at the same time." It has been broken for years and nobody complained. > If we change to > always accumulate first and flush once we're done, then we essentially > have a change in behaviour as FETCH_HEADs aren't written in an > interleaving fashion anymore, but only at the end. I view it a good thing, actually, for another reason [*2*], but that would take a follow-up fix that is much more involved, so let's not go there (yet). At least buffering a single line's worth of output in a buffer and writing it out in a single write() would get us much closer to solving the "mixed up output" from multiple processes problem the current code seems to already have. > Also, if there is any > unexpected error in between updating references which causes us to die, > then we wouldn't have written the already updated references to > FETCH_HEAD now. That one may be a valid concern. Thanks. [Footnote] *1* "git fetch --all/--multiple" was strictly serial operation to avoid such a mixed-up output problem, but perhaps we weren't careful enough when we introduced parallel fetch and broke it? *2* When fetching from a single remote, the code makes sure that a record that is not marked as 'not-for-merge' is listed first by reordering the records, but it does not work well when fetching from multiple remotes. Queuing all in the buffer before showing them would give us an opportunity to sort, but as I said, it takes coordination among the processes---instead of each process writing to FETCH_HEAD on their own, somebody has to collect the records from them, reorder as needed and write them all out. cf. https://lore.kernel.org/git/X8oL190Vl03B0cQ%2F@xxxxxxxxxxxxxxxxxxxxxxx/