V2 replaces an #if'd memset with some brace initialisation (patch 3/3) as per review comments. I've also removed an irrelevant "technically" from commit message 2/3, and fixed a typo in commit message 3/3. Andrzej Hunt (3): bulk-checkin: make buffer reuse more obvious and safer split-index: use oideq instead of memcmp to compare object_id's builtin/checkout--worker: zero-initialise struct to avoid MSAN complaints builtin/checkout--worker.c | 2 +- bulk-checkin.c | 3 +-- split-index.c | 3 ++- 3 files changed, 4 insertions(+), 4 deletions(-) base-commit: 62a8d224e6203d9d3d2d1d63a01cf5647ec312c9 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1033%2Fahunt%2Fmsan-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1033/ahunt/msan-v2 Pull-Request: https://github.com/git/git/pull/1033 Range-diff vs v1: 1: 7659d4bf13c2 = 1: 7659d4bf13c2 bulk-checkin: make buffer reuse more obvious and safer 2: 14b0d5dd7fce ! 2: 6943eb511bee split-index: use oideq instead of memcmp to compare object_id's @@ Commit message include that field when calling memcmp on a subset of the cache_entry. Depending on which hashing algorithm is being used, only part of object_id.hash is actually being used, therefore including it in a - memcmp() is technically incorrect. Instead we choose to exclude the - object_id when calling memcmp(), and call oideq() separately. + memcmp() is incorrect. Instead we choose to exclude the object_id when + calling memcmp(), and call oideq() separately. This issue was found when running t1700-split-index with MSAN, see MSAN output below (on my machine, offset 76 corresponds to 4 bytes after the 3: cd1e1f6985c7 ! 3: 4bdc0b77f6f2 builtin/checkout--worker: memset struct to avoid MSAN complaints @@ Metadata Author: Andrzej Hunt <ajrhunt@xxxxxxxxxx> ## Commit message ## - builtin/checkout--worker: memset struct to avoid MSAN complaints + builtin/checkout--worker: zero-initialise struct to avoid MSAN complaints report_result() sends a struct to the parent process, but that struct - contains unintialised padding bytes. Running this code under MSAN - rightly triggers a warning - but we also don't care about this warning - because we control the receiving code, and we therefore know that those - padding bytes won't be read on the receiving end. Therefore we add a - memset to convince MSAN that this memory is safe to read - but only - when building with MSAN to avoid this cost in normal usage. + would contain uninitialised padding bytes. Running this code under MSAN + rightly triggers a warning - but we don't particularly care about this + warning because we control the receiving code, and we therefore know + that those padding bytes won't be read on the receiving end. + + We could simply suppress this warning under MSAN with the approporiate + ifdef'd attributes, but a less intrusive solution is to 0-initialise the + struct, which guarantees that the padding will also be initialised. Interestingly, in the error-case branch, we only try to copy the first two members of pc_item_result, by copying only PC_ITEM_RESULT_BASE_SIZE @@ Commit message after the end of the second last member. We could avoid doing this by redefining PC_ITEM_RESULT_BASE_SIZE as 'offsetof(second_last_member) + sizeof(second_last_member)', but there's - no huge benefit to doing so (and our memset hack silences the MSAN - warning in this scenario either way). + no huge benefit to doing so (and this patch silences the MSAN warning in + this scenario either way). MSAN output from t2080 (partially interleaved due to the parallel work :) ): @@ Commit message Signed-off-by: Andrzej Hunt <andrzej@xxxxxxxxx> ## builtin/checkout--worker.c ## -@@ builtin/checkout--worker.c: static void report_result(struct parallel_checkout_item *pc_item) - struct pc_item_result res; +@@ builtin/checkout--worker.c: static void packet_to_pc_item(const char *buffer, int len, + + static void report_result(struct parallel_checkout_item *pc_item) + { +- struct pc_item_result res; ++ struct pc_item_result res = { 0 }; size_t size; -+#if defined(__has_feature) -+# if __has_feature(memory_sanitizer) -+ // MSAN workaround: res contains padding bytes, which will remain -+ // permanently unintialised. Later, we read all of res in order to send -+ // it to the parent process - and MSAN (rightly) complains that we're -+ // reading those unintialised padding bytes. By memset'ing res we -+ // guarantee that there are no uninitialised bytes. -+ memset(&res, 0, sizeof(res)); -+#endif -+#endif -+ res.id = pc_item->id; - res.status = pc_item->status; - -- gitgitgadget