Here is a relatively small reroll of mine and Ævar's series to prevent packfile-related races when moving the `.idx` into place before other auxiliary files are renamed. All changes are cosmetic, but all of the feedback on the previous round was a strict improvement in the overall quality, so it seemed pertinent to send an improved version. Range-diff is below, and thanks in advance for your review. Taylor Blau (4): bulk-checkin.c: store checksum directly pack-write.c: rename `.idx` files after `*.rev` builtin/repack.c: move `.idx` files into place last builtin/index-pack.c: move `.idx` files into place last Ævar Arnfjörð Bjarmason (5): pack.h: line-wrap the definition of finish_tmp_packfile() pack-write: refactor renaming in finish_tmp_packfile() index-pack: refactor renaming in final() pack-write: split up finish_tmp_packfile() function pack-objects: rename .idx files into place after .bitmap files builtin/index-pack.c | 48 +++++++++++++++++------------------ builtin/pack-objects.c | 15 ++++++++--- builtin/repack.c | 2 +- bulk-checkin.c | 31 +++++++++++++++++------ pack-write.c | 57 +++++++++++++++++++++--------------------- pack.h | 10 +++++++- 6 files changed, 96 insertions(+), 67 deletions(-) Range-diff against v1: -: ---------- > 1: 0b07aa4947 pack.h: line-wrap the definition of finish_tmp_packfile() 1: 20b35ce050 ! 2: c46d3c29b4 bulk-checkin.c: store checksum directly @@ Commit message Store the hash directly in an unsigned char array. This behaves the same as writing to the `hash` member, but makes the intent clearer (and avoids allocating an extra four bytes for the `algo` member of `struct - object_id`). + object_id`). It likewise prevents the possibility of a segfault when + reading `algo` (e.g., by calling `oid_to_hex()`) if it is uninitialized. Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> 2: 35052ef494 ! 3: 2e907e309d pack-write: refactor renaming in finish_tmp_packfile() @@ pack-write.c: struct hashfile *create_tmp_packfile(char **pack_tmp_name) return hashfd(fd, *pack_tmp_name); } -+static void rename_tmp_packfile(struct strbuf *nb, const char *source, ++static void rename_tmp_packfile(struct strbuf *name_prefix, const char *source, + const char *ext) +{ -+ size_t nb_len = nb->len; ++ size_t name_prefix_len = name_prefix->len; + -+ strbuf_addstr(nb, ext); -+ if (rename(source, nb->buf)) -+ die_errno("unable to rename temporary '*.%s' file to '%s'", -+ ext, nb->buf); -+ strbuf_setlen(nb, nb_len); ++ strbuf_addstr(name_prefix, ext); ++ if (rename(source, name_prefix->buf)) ++ die_errno("unable to rename temporary file to '%s'", ++ name_prefix->buf); ++ strbuf_setlen(name_prefix, name_prefix_len); +} + void finish_tmp_packfile(struct strbuf *name_buffer, 3: 0fb2c25f5a = 4: 41d34b1f18 pack-write.c: rename `.idx` files after `*.rev` 4: 3b10a97ec0 = 5: 6b340b7eba builtin/repack.c: move `.idx` files into place last 5: 3c9b515907 ! 6: 1e4d0ea0f3 index-pack: refactor renaming in final() @@ Commit message helper, this is both a net decrease in lines, and improves the readability, since we can easily see at a glance that the logic for writing these three types of files is exactly the same, aside from the - obviously differing cases of "*final_xyz_name" being NULL, and - "else_chmod_if" being different. + obviously differing cases of "*final_name" being NULL, and + "make_read_only_if_same" being different. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> @@ builtin/index-pack.c: static void write_special_file(const char *suffix, const c strbuf_release(&name_buf); } -+static void rename_tmp_packfile(const char **final_xyz_name, -+ const char *curr_xyz_name, -+ struct strbuf *xyz_name, unsigned char *hash, -+ const char *ext, int else_chmod_if) ++static void rename_tmp_packfile(const char **final_name, ++ const char *curr_name, ++ struct strbuf *name, unsigned char *hash, ++ const char *ext, int make_read_only_if_same) +{ -+ if (*final_xyz_name != curr_xyz_name) { -+ if (!*final_xyz_name) -+ *final_xyz_name = odb_pack_name(xyz_name, hash, ext); -+ if (finalize_object_file(curr_xyz_name, *final_xyz_name)) ++ if (*final_name != curr_name) { ++ if (!*final_name) ++ *final_name = odb_pack_name(name, hash, ext); ++ if (finalize_object_file(curr_name, *final_name)) + die(_("unable to rename temporary '*.%s' file to '%s"), -+ ext, *final_xyz_name); -+ } else if (else_chmod_if) { -+ chmod(*final_xyz_name, 0444); ++ ext, *final_name); ++ } else if (make_read_only_if_same) { ++ chmod(*final_name, 0444); + } +} + 6: 8d67a71501 = 7: 906e75d707 builtin/index-pack.c: move `.idx` files into place last 7: 5c553229b0 ! 8: 90bebe4e51 pack-write: split up finish_tmp_packfile() function @@ bulk-checkin.c: static struct bulk_checkin_state { unsigned char hash[GIT_MAX_RAWSZ]; ## pack-write.c ## -@@ pack-write.c: static void rename_tmp_packfile(struct strbuf *nb, const char *source, - strbuf_setlen(nb, nb_len); +@@ pack-write.c: static void rename_tmp_packfile(struct strbuf *name_prefix, const char *source, + strbuf_setlen(name_prefix, name_prefix_len); } -void finish_tmp_packfile(struct strbuf *name_buffer, 8: d8286cf107 ! 9: 1409725509 pack-objects: rename .idx files into place after .bitmap files @@ Commit message about filesystem races in the face of doing and not doing fsync() (and if doing fsync(), not doing it properly). - In particular, in this case of writing to ".git/objects/pack" we only - write and fsync() the individual files, but if we wanted to guarantee - that the metadata update was seen in that way by concurrent processes - we'd need to fsync() on the "fd" of the containing directory. That - concern is probably more theoretical than not, modern OS's tend to be - more on the forgiving side than the overly pedantic side of - implementing POSIX FS semantics. + We may want to fsync the containing directory once after renaming the + *.idx file into place, but that is outside the scope of this series. 1. https://lore.kernel.org/git/8735qgkvv1.fsf@xxxxxxxxxxxxxxxxxxx/ -- 2.33.0.96.g73915697e6