Re: [PATCH v2] fetch-pack: optionally save packs to disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 12, 2015 at 2:22 AM, Johannes Sixt <j6t@xxxxxxxx> wrote:
>
> Am 11.06.2015 um 20:59 schrieb Augie Fackler:
>>
>> When developing server software, it's often helpful to save a
>> potentially-bogus pack for later analysis. This makes that trivial,
>> instead of painful.
>
>
> When you develop server software, shouldn't you test drive the server via
the bare metal protocol anyway? That *is* painful, but unavoidable because
you must harden the server against any garbage that a potentially malicous
client could throw at it. Restricting yourself to a well-behaved client
such as fetch-pack is only half the deal.


We do that too, but sometimes I've encountered an edge case that's
trivially reproduced from an existing repo, and going through the work to
manually drive the server is a monumental pain in the butt, and all I
*really* need is to see the bytes sent from the server to the client. If it
weren't for SSL-everywhere, I'd probably just do this with wireshark, but
that's not the world I live in.

>
>
> That said, I do think that fetch-pack could learn a mode that makes it
easier to debug the normal behavior of a server (if such a mode is missing
currently).
>
> What is the problem with the current fetch-pack implementation? Does it
remove a bogus packfile after download? Does it abort during download when
it detects a broken packfile? Does --keep not do what you need?


fetch-pack doesn't store the pack anywhere - it's sending it to index-pack
(or unpack-objects) using --stdin, which means that the raw bytes from the
server currently are never materialized anywhere on disk. Having index-pack
do this is too late, because it's doing things like rewriting the pack
header in a potentially new format.

(Junio also covered this well, thanks!)

>
>
> Instead of your approach (which forks off tee to dump a copy of the
packfile), would it not be simpler to add an option --debug-pack (probably
not the best name) that skips the cleanup step when a broken packfile is
detected and prints the name of the downloaded packfile?
>
>
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index a912935..fe6ba58 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -684,7 +684,7 @@ static int get_pack(struct fetch_pack_args *args,
>>         const char *argv[22];
>>         char keep_arg[256];
>>         char hdr_arg[256];
>> -       const char **av, *cmd_name;
>> +       const char **av, *cmd_name, *savepath;
>>         int do_keep = args->keep_pack;
>>         struct child_process cmd = CHILD_PROCESS_INIT;
>>         int ret;
>> @@ -708,9 +708,8 @@ static int get_pack(struct fetch_pack_args *args,
>>         cmd.argv = argv;
>>         av = argv;
>>         *hdr_arg = 0;
>> +       struct pack_header header;
>>         if (!args->keep_pack && unpack_limit) {
>> -               struct pack_header header;
>> -
>>                 if (read_pack_header(demux.out, &header))
>>                         die("protocol error: bad pack header");
>>                 snprintf(hdr_arg, sizeof(hdr_arg),
>> @@ -762,7 +761,44 @@ static int get_pack(struct fetch_pack_args *args,
>>                 *av++ = "--strict";
>>         *av++ = NULL;
>>
>> -       cmd.in = demux.out;
>> +       savepath = getenv("GIT_SAVE_FETCHED_PACK_TO");
>> +       if (savepath) {
>> +               struct child_process cmd2 = CHILD_PROCESS_INIT;
>> +               const char *argv2[22];
>> +               int pipefds[2];
>> +               int e;
>> +               const char **av2;
>> +               cmd2.argv = argv2;
>> +               av2 = argv2;
>> +               *av2++ = "tee";
>> +               if (*hdr_arg) {
>> +                       /* hdr_arg being nonempty means we already read
the
>> +                        * pack header from demux, so we need to drop a
pack
>> +                        * header in place for tee to append to,
otherwise
>> +                        * we'll end up with a broken pack on disk.
>> +                        */
>
>
>                         /*
>                          * Write multi-line comments
>                          * like this (/* on its own line)
>                          */
>
>> +                       int fp;
>> +                       struct sha1file *s;
>> +                       fp = open(savepath, O_CREAT | O_TRUNC |
O_WRONLY, 0666);
>> +                       s = sha1fd_throughput(fp, savepath, NULL);
>> +                       sha1write(s, &header, sizeof(header));
>> +                       sha1flush(s);
>
>
> Are you abusing sha1write() and sha1flush() to write a byte sequence to a
file? Is write_in_full() not sufficient?


I didn't know about write_in_full. I'm very much *not* familiar with git's
codebase - I know the protocols and formats reasonably well, but have
needed only occasionally to look at the code. That works, thanks.

>
>
>
>> +                       close(fp);
>> +                       /* -a is supported by both GNU and BSD tee */
>> +                       *av2++ = "-a";
>> +               }
>> +               *av2++ = savepath;
>> +               *av2++ = NULL;
>> +               cmd2.in = demux.out;
>> +               e = pipe(pipefds);
>> +               if (e != 0)
>> +                       die("couldn't make pipe to save pack");
>
>
> start_command() can create the pipe for you. Just say cmd2.out = -1.
>
>> +               cmd2.out = pipefds[1];
>> +               cmd.in = pipefds[0];
>> +               if (start_command(&cmd2))
>> +                       die("couldn't start tee to save a pack");
>
>
> When you call start_command(), you must also call finish_command().
start_command() prints an error message for you; you don't have to do that
(the start_command() in the context below is a bad example).

I looked around, and there are nonzero exit paths from start_command()
which do not print an error and die, so this seems safer. It's also in line
with the vast majority of uses of start_command in the codebase, so I left
this as-is. If you've got something specific you'd like to see here
instead, do let me know (presumably I still need to check the error code
from start_command()...)

>
>
>
>> +       } else
>> +               cmd.in = demux.out;
>>         cmd.git_cmd = 1;
>>         if (start_command(&cmd))
>>                 die("fetch-pack: unable to fork off %s", cmd_name);

[snip some good comments about test cleanups, all addressed]

>
> -- Hannes
>

I'll wait to mail a v3 until at least I know what's going on with
start_command() and error checking - perhaps until I get consensus on the
use of tee vs something else to save the bytes from the server.

https://github.com/durin42/git/commit/save-pack has the current version of
the patch if you want to see where it stands now.

Thanks for the review!

Augie
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]