Re: [PATCH v2 0/4] Don't lazy-fetch commits when parsing them

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Thu, 1 Dec 2022 13:26:50 -0800

Jeff King <peff@xxxxxxxx> writes:
> On Thu, Dec 01, 2022 at 11:27:29AM -0800, Jonathan Tan wrote:
> 
> > Thanks everyone for your reviews. Here is a reroll with the requested change
> > (just one small one).
> 
> Thanks, this looks OK to me. However Junio noted in "What's cooking"
> that it seems to break CI on windows. The problem is in t5318.93:
> 
>   2022-12-01T09:26:44.8887018Z ++ cat test_err
>   2022-12-01T09:26:44.8887414Z error: Could not read 0000000000000000000000000000000000000000
>   2022-12-01T09:26:44.8887825Z error: Could not read 0000000000000000000000000000000000000000
>   2022-12-01T09:26:44.8888240Z error: Could not read 0000000000000000000000000000000000000000
>   2022-12-01T09:26:44.8888639Z error: Could not read 0000000000000000000000000000000000000000
>   2022-12-01T09:26:44.8889052Z error: Could not read 0000000000000000000000000000000000000000
>   2022-12-01T09:26:44.8889512Z error: Could not read 0000000000000000000000000000000000000000
>   2022-12-01T09:26:44.8889991Z fatal: failed to read object 0000000000000000000000000000000000000000: Function not implemented
>   2022-12-01T09:26:44.8890401Z ++ return 1
>   2022-12-01T09:26:44.8890761Z error: last command exited with $?=1
>   2022-12-01T09:26:44.8891263Z not ok 93 - corrupt commit-graph write (broken parent)
> 
> Looks like the check in die_if_corrupt() is seeing a different errno
> value than ENOENT. I wonder if we need to take more care to preserve it
> across calls. It does look like we hit the same sequence of functions
> that read_object_file_extended() did, but perhaps this was buggy all
> along, and you're now exposing it through a new code path.
> 
> In particular I wonder if obj_read_unlock() might be the culprit here,
> and something like this might help:
> 
> diff --git a/object-file.c b/object-file.c
> index 8adef99a7c..db2d35519e 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1641,9 +1641,12 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
>  			     struct object_info *oi, unsigned flags)
>  {
>  	int ret;
> +	int save_errno;
>  	obj_read_lock();
>  	ret = do_oid_object_info_extended(r, oid, oi, flags);
> +	save_errno = errno;
>  	obj_read_unlock();
> +	errno = save_errno;
>  	return ret;
>  }

Copying die_if_corrupt() until "failed to read object":

> 1734 void die_if_corrupt(struct repository *r,                                                                                                                                                       
> 1735                     const struct object_id *oid,                                                                                                                                                
> 1736                     const struct object_id *real_oid)                                                                                                                                           
> 1737 {                                                                                                                                                                                               
> 1738         const struct packed_git *p;                                                                                                                                                             
> 1739         const char *path;                                                                                                                                                                       
> 1740         struct stat st;                                                                                                                                                                         
> 1741                                                                                                                                                                                                 
> 1742         obj_read_lock();                                                                                                                                                                        
> 1743         if (errno && errno != ENOENT)                                                                                                                                                           
> 1744                 die_errno(_("failed to read object %s"), oid_to_hex(oid));

I wonder if we could just remove this check. Even as it is, I don't think that
there is any guarantee that obj_read_lock() would not clobber errno. Removing
it makes all tests pass locally, but I haven't tried it on CI.

(One argument that could be made is that we shouldn't have any die_if_corrupt()
refactoring or other refactoring of the sort, because previously its contents
was part of a function and it could thus rely on the errno of what has happened
previously. But I think that even without my patches, we couldn't rely on it
in the first place - looking at obj_read_lock(), it looks like it could init a
mutex, and depending on the implementation of that, it could clobber errno.)