RE: [PATCH] read-cache: fix index corruption with index v4

Kevin Willford <kewillf@xxxxxxxxxxxxx> · Tue, 5 Sep 2017 19:37:28 +0000

> From: Thomas Gummerer [mailto:t.gummerer@xxxxxxxxx]
> Sent: Monday, September 4, 2017 4:58 PM
> 
> ce012deb98 ("read-cache: avoid allocating every ondisk entry when
> writing", 2017-08-21) changed the way cache entries are written to the
> index file.  While previously it wrote the name to an struct that was
> allocated using xcalloc(), it now uses ce_write() directly.  Previously
> ce_namelen - common bytes were written to the cache entry, which would
> automatically make it nul terminated, as it was allocated using calloc.
> 
> Now we are writing ce_namelen - common + 1 bytes directly from the
> ce->name to the index.  As ce->name is however not nul terminated, and
> index-v4 needs the nul terminator to split between one index entry and
> the next, this would end up in a corrupted index.
> 
> Fix that by only writing ce_namelen - common bytes directly from
> ce->name to the index, and adding the nul terminator in an extra call to
> ce_write.
> 
> This bug was turned up by setting TEST_GIT_INDEX_VERSION = 4 in
> config.mak and running the test suite (t1700 specifically broke).
> 
> Signed-off-by: Thomas Gummerer <t.gummerer@xxxxxxxxx>
> ---
> 
> I unfortunately didn't have more time to dig so
> 
> > As ce->name is however not nul terminated
> 
> just comes from my memory and from the patch below actually fixing the
> corruption, so it's really the most likely cause.  Would be great if
> someone who can remember more about the index could confirm that this
> is indeed the case.
> 

Digging into this and ce->name IS nul terminated.  The issue comes in when
the CE_STRIP_NAME is set, which is only set when using a split index. 
This sets the ce->ce_namelen = 0 without changing the actual ce->name buffer.
When writing the entry for the split index version 4 it was using the first character
in the ce->name buffer because of the + 1, which obviously isn't correct.    Before
it was using a newly allocated name buffer from the ondisk struct which was
allocated based on the ce_namelen of zero.

>  read-cache.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index 40da87ea71..80830ddcfc 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2103,7 +2103,9 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct
> cache_entry *ce,
>  		if (!result)
>  			result = ce_write(c, fd, to_remove_vi, prefix_size);
>  		if (!result)
> -			result = ce_write(c, fd, ce->name + common,
> ce_namelen(ce) - common + 1);
> +			result = ce_write(c, fd, ce->name + common,
> ce_namelen(ce) - common);
> +		if (!result)
> +			result = ce_write(c, fd, "\0", 1);

You could use the padding variable here as well which is used in the < version 4
ce_write.

> 
>  		strbuf_splice(previous_name, common, to_remove,
>  			      ce->name + common, ce_namelen(ce) - common);
> --
> 2.14.1.480.gb18f417b89

While looking at the code I was wondering if we could get around the
whole setting ce->ce_namelen to zero bit but that would be much bigger
patch and possibly introduce other bugs so this seems the appropriate
fix for now.

Thanks for finding this!
Kevin