On Fri, Mar 14, 2014 at 05:21:59PM +0700, Duy Nguyen wrote: > On Fri, Mar 14, 2014 at 4:43 PM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote: > > Would it be practical to change it to a percentage of bytes written? > > Then we'd have progress info that is both convenient *and* truthful. > > I agreed for a second, then remembered that we don't know the final > pack size until we finish writing it.. Not sure if we could estimate > (cheaply) with a good accuracy though. Right. I'm not sure what Michael meant by "it". We can send a percentage of bytes written for the reused pack (my option 2), but we do not know the total bytes for the rest of the objects. So we'd end up with two progress meters (one for the reused pack, and one for everything else), both counting up to different endpoints. And it would require quite a few changes to the progress code. > If an object is reused, we already know its compressed size. If it's > not reused and is a loose object, we could use on-disk size. It's a > lot harder to estimate an not-reused, deltified object. All we have is > the uncompressed size, and the size of each delta in the delta chain.. > Neither gives a good hint of what the compressed size would be. Hmm. I think we do have the compressed delta size after having run the compression phase (because that is ultimately what we compare to find the best delta). Loose objects are probably the hardest here, as we actually recompress them (IIRC, because packfiles encode the type/size info outside of the compressed bit, whereas it is inside for loose objects; the "experimental loose" format harmonized this, but it never caught on). Without doing that recompression, any value you came up with would be an estimate, though it would be pretty close (not off by more than a few bytes per object). However, you can't just run through the packing list and add up the object sizes; you'd need to do a real "dry-run" through the writing phase. There are probably more I'm missing, but you need at least to figure out: 1. The actual compressed size of a full loose object, as described above. 2. The variable-length headers for each object based on its type and size. 3. The final form that the object will take based on what has come before. For example, if there is a max pack size, we may split an object from its delta base, in which case we have to throw away the delta. We don't know where those breaks will be until we walk through the whole list. 4. If an object we attempt to reuse turns out to be corrupted, we fall back to the non-reuse code path, which will have a different size. So you'd need to actually check the reused object CRCs during the dry-run (and for local repacks, not transfers, we actually inflate and check the zlib, too, for safety). So I think it's _possible_. But it's definitely not trivial. For now, I think it makes sense to go with something like the patch I posted earlier (which I'll re-roll in a few minutes). That fixes what is IMHO a regression in the bitmaps case. And it does not make it any harder for somebody to later convert us to a true byte-counter (i.e., it is the easy half already). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html