Re: Clone fails on a repo with too many heads/tags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ivan Todoroski <grnch_lists <at> gmx.net> writes:
> Now share this repo using the Smart HTTP transport (git-http-backend) and 
then 
> try cloning it in a different directory. This is what you would get:
> 
> $ git clone http://localhost/.../too-many-refs/.git
> Cloning into 'too-many-refs'...
> fatal: cannot exec 'fetch-pack': Argument list too long
> 
> [...]
> 
> The solution is conceptually simple: if the list of refs results in a too 
long 
> command line, split the refs in batches and call fetch-pack multiple times 
such 
> that each call is under the cmdline limit:
> 
> git fetch-pack --stateless-rpc --lock-pack ...<first batch of refs>...
> git fetch-pack --stateless-rpc --lock-pack ...<second batch of refs>...
> ...
> git fetch-pack --stateless-rpc --lock-pack ...<last batch of refs>...


BTW, I didn't want to sound like I am expecting or demanding a fix. If the 
experienced Git devs lack the time or inclination to work on this bug 
(understandable), I am certainly willing to try it myself. My C skills are a 
bit rusty and I'm not very familiar with the Git codebase, but I will do my 
best to follow Documentation/SubmittingPatches as well as the existing code 
structure.

I will need a few pointers to get me started in the right direction though...


1) Is splitting the cmdline in batches and executing fetch-pack multiple times 
the right approach? If you have another solution please suggest.


2) Should I add the test case for this bug to existing scripts like t/t5551-
http-fetch.sh and t/t5561-http-backend.sh, or should I create a new test script 
under t/ following their example? There will probably be only one test case for 
this bug, basically the script I pasted in the original email to reproduce it.


3) What would be the most portable way to get the cmdline length limit between 
POSIX and Windows? Would something like this be acceptable:

#ifder _WIN32
	int cmdline_limit = 32767;
#else
	int cmdline_limit = sysconf(_SC_ARG_MAX);
#endif

I couldn't actually find a Windows API to get the cmdline limit, but this blog 
post by one of the Windows people tells the value:

http://blogs.msdn.com/b/oldnewthing/archive/2003/12/10/56028.aspx


4) Should this problem be fixed only in remote-curl.c:fetch_git() or should it 
be solved more generally in run-command.c:start_command(), which is used by 
fetch_git() for the actual invocation?

If this is fixed only in remote-curl:fetch_git(), then the same logic would 
need to be open coded in any other such place that might be found. Are you 
aware of any other internal sub-commands that put all refs on the command line 
and could be susceptible to the same issue?


If it's fixed at a lower level in run-command.c:start_command(), the logic 
would become available to any other sub-command that needs it.

However, this would mean that struct child_process as well as struct rpc_state 
would need an additional field that would tell whether the command is safe to 
execute in multiple batches and how many of the arguments at the beginning of 
child_process.argv must be preserved on every invocation (the switches and 
such).

Something like child_process.split_after, which if non-zero would mean that 
start_command() is free to invoke the command multiple times when argv exceeds 
the cmdline limit, by grouping any arguments after argv[split_after] in smaller 
batches for each invocation.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]