Re: [Patch] Prevent cloning over http from spewing

Tay Ray Chuan <rctay89@xxxxxxxxx> · Wed, 10 Jun 2009 22:03:10 +0800

Hi,

On Mon, Jun 8, 2009 at 8:24 PM, Jeff King<peff@xxxxxxxx> wrote:
> Thanks, I just looked at it (though sadly it does not merge into what is
> in 'next' right now).

thanks for taking the time to do try it out (and with linux-2.6, at that).

> My first complaint is that it is way too long. It wrapped in my
> 80-column terminal, causing all sorts of visual confusion.

The byte counts can really take up alot of space. Perhaps we should
just show the size (MiB) and completed percentage, sans byte counts?

>> 1. %d objects = number of concurrent objects being fetched, usually
>> around 4-5. Since objects are fetched alongside other files like packs
>> and pack indices, I separated this from (4).
>
> This did at times say '4' for me, but just as often it said '0' (even
> when stuff was obviously downloading). I hadn't thought about the fact
> that we have concurrent downloads. That really makes things harder.
> Though it seems like we only do one pack file at a time (so maybe that
> is the reason for the '0' -- we are downloading a pack).

Fetching of objects and packs take place separately; it doesn't mean
that when '0' objects are being fetched, we're definitely fetching
something else (eg. packs). Perhaps we should "hide" the "Fetching 0
objects" part when the number of simultaneous object fetches is 0?

> In fact, while watching the progress go for the linux-2.6 download, it
> is really hard to tell what is going on. The "% completed" number jumps
> around between multiple values, even showing what appears to be
> nonsense:
>
>  Fetching 0 objects (got 2 of 320, 0 alt) and pack:   8%
>  (241096602/302989431), 229.78 MiB | 668 KiB/s
>  ...
>  Fetching 0 objects (got 2 of 320, 0 alt) and pack:   4%
>  (270741882/302989431), 257.93 MiB | 690 KiB/s
>
> Those are two cut-and-pastes from the same fetch. You can see it
> progressing in terms of absolute numbers, but the percentage values make
> no sense. The "total downloaded" and throughput numbers look roughly
> correct. I don't know if this is caused by multiple simultaenous
> downloads.

It's likely that your guess (about simultaneous downloads) is causing
the inconsistent % completed, but then packs aren't downloaded
simultaneously. I'll have to check this again.

> Fetching linux-2.6, I spent a very long time on "got 2 of 320" which
> really wasn't all that helpful (because almost the whole thing is in
> packs). Probably a pack count would be useful there. Though I wonder if
> there is some shorter way to summarize what is going on to keep the line
> smaller.

We are of course assuming that the user is fetching from a well-packed
repo. Again, perhaps we could cut out the "Fetching 0 objects" part.

> But somewhat worse is that we start at '320', spend a lot of time, and
> then magically it ends up at 1182387 at the end. So it is not all that
> useful as a progress counter, because we don't actually know the total.
> So we can show that we are progressing, but the end keeps getting
> farther away. :)

The total number of objects (320) increases as we "walk" the commits;
sometimes we need to fetch the "walked" objects, sometimes we don't
(eg. in packs we've fetched already). There's no way to know in
advance the total; hence, the continually updating of the total. I
don't think there's it's a problem; the idea is to let the user be
sure that git is active.

>> 3. %d alt = number of alternate objects. The way I'm counting them now
>> is very inaccurate; I may drop this if it's too complicated to do an
>> accurate count. I added this because some people use forked repos, and
>> they may wonder why after some time, the number of objects fetched
>> doesn't increase. (The time was spent on waiting for the server, only
>> for it to return a 404).
>
> In the name of conserving space on the line, perhaps you should just
> count this as a "fetched" object and increment the fetched count by one.
> The user doesn't have to care which were alt and which were not, as long
> as they see a counter progressing towards completion.

Ok. I'll just drop this then. (The way I'm doing it right now isn't
very accurate: the "alt" count increases the moment git realises the
object might be found at the alternate location, not the moment the
object at the alternate location is fetched.)

> I wonder if you should start a newline every time we get to a new
> "phase". So you might see:
>
>  Downloading %d loose objects: Z% (X/Y), x MiB | y KiB/s, done
>  Fetching pack 1 of 2: Z% (X/Y), x MiB | y KiB/s, done
>  Verifying pack 1 of 2: Z% (X/Y)
>  Fetching pack 2 of 2: Z% (X/Y), x MiB | y KiB/s, done
>  Verifying pack 2 of 2: Z% (X/Y)
>
> That assumes we download packs one at a time (is that right?). It does take
> a couple of lines to show what is going on, but I think most repos are
> only going to have a couple of packs (though in theory, you could have
> more "loose objects" lines interspersed with your packs).

Yeah, we do download packs one at a time (as I said above).

-- 
Cheers,
Ray Chuan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html