Hi Duy, On Sun, 29 Apr 2018, Duy Nguyen wrote: > On Tue, Apr 24, 2018 at 8:50 AM, Elijah Newren <newren@xxxxxxxxx> wrote: > > Currently, all callers of unpack_trees() set o->src_index == o->dst_index. > > The code in unpack_trees() does not correctly handle them being different. > > There are two separate issues: > > > > First, there is the possibility of memory corruption. Since > > unpack_trees() creates a temporary index in o->result and then discards > > o->dst_index and overwrites it with o->result, in the special case that > > o->src_index == o->dst_index, it is safe to just reuse o->src_index's > > split_index for o->result. However, when src and dst are different, > > reusing o->src_index's split_index for o->result will cause the > > split_index to be shared. If either index then has entries replaced or > > removed, it will result in the other index referring to free()'d memory. > > > > Second, we can drop the index extensions. Previously, we were moving > > index extensions from o->dst_index to o->result. Since o->src_index is > > the one that will have the necessary extensions (o->dst_index is likely to > > be a new index temporary index created to store the results), we should be > > moving the index extensions from there. > > > > Signed-off-by: Elijah Newren <newren@xxxxxxxxx> > > --- > > > > Differences from v2: > > - Don't NULLify src_index until we're done using it > > - Actually built and tested[1] > > > > But it now passes the testsuite on both linux and mac[2], and I even re-merged > > all 53288 merge commits in linux.git (with a merge of this patch together with > > the directory rename detection series) for good measure. [Only 7 commits > > showed a difference, all due to directory rename detection kicking in.] > > > > [1] Turns out that getting all fancy with an m4.10xlarge and nice levels of > > parallelization are great until you realize that your new setup omitted a > > critical step, leaving you running a slightly stale version of git instead... > > :-( > > > > [2] Actually, I get two test failures on mac from t0050-filesystem.sh, both > > with unicode normalization tests, but those two tests fail before my changes > > too. All the other tests pass. > > > > unpack-trees.c | 19 +++++++++++++++---- > > 1 file changed, 15 insertions(+), 4 deletions(-) > > > > diff --git a/unpack-trees.c b/unpack-trees.c > > index e73745051e..49526d70aa 100644 > > --- a/unpack-trees.c > > +++ b/unpack-trees.c > > @@ -1284,9 +1284,20 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options > > o->result.timestamp.sec = o->src_index->timestamp.sec; > > o->result.timestamp.nsec = o->src_index->timestamp.nsec; > > o->result.version = o->src_index->version; > > - o->result.split_index = o->src_index->split_index; > > - if (o->result.split_index) > > + if (!o->src_index->split_index) { > > + o->result.split_index = NULL; > > + } else if (o->src_index == o->dst_index) { > > + /* > > + * o->dst_index (and thus o->src_index) will be discarded > > + * and overwritten with o->result at the end of this function, > > + * so just use src_index's split_index to avoid having to > > + * create a new one. > > + */ > > + o->result.split_index = o->src_index->split_index; > > o->result.split_index->refcount++; > > + } else { > > + o->result.split_index = init_split_index(&o->result); > > + } > > hashcpy(o->result.sha1, o->src_index->sha1); > > o->merge_size = len; > > mark_all_ce_unused(o->src_index); > > @@ -1401,7 +1412,6 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options > > } > > } > > > > - o->src_index = NULL; > > ret = check_updates(o) ? (-2) : 0; > > if (o->dst_index) { > > if (!ret) { > > @@ -1412,12 +1422,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options > > WRITE_TREE_SILENT | > > WRITE_TREE_REPAIR); > > } > > - move_index_extensions(&o->result, o->dst_index); > > + move_index_extensions(&o->result, o->src_index); > > While this looks like the right thing to do on paper, I believe it's > actually broken for a specific case of untracked cache. In short, > please do not touch this line. I will send a patch to revert > edf3b90553 (unpack-trees: preserve index extensions - 2017-05-08), > which essentially deletes this line, with proper explanation and > perhaps a test if I could come up with one. > > When we update the index, we depend on the fact that all updates must > invalidate the right untracked cache correctly. In this unpack > operations, we start copying entries over from src to result. Since > 'result' (at least from the beginning) does not have an untracked > cache, it has nothing to invalidate when we copy entries over. By the > time we have done preparing 'result', what's recorded in src's (or > dst's for that matter) untracked cache may or may not apply to > 'result' index anymore. This copying only leads to more problems when > untracked cache is used. Is there really no way to invalidate just individual entries? I have a couple of worktrees which are *huge*. And edf3b90553 really helped relieve the pain a bit when running `git status`. Now you say that even a `git checkout -b new-branch` would blow the untracked cache away again? It would be *really* nice if we could prevent that performance regression somehow. Ciao, Dscho