Re: [PATCH] git-{repack,pack-objects} accept --{pack,blob}-limit to control pack size

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Thu, 5 Apr 2007 03:15:12 -0400

Nicolas Pitre <nico@xxxxxxx> wrote:
> But let's take a moment to talk about your "newbie submission".  This 
> patch is _way_ too large and covers too many things at once.  You'll 
> have to split it in many smaller logical units pretty please.

Yes, absolutely!  If I was doing this series I'd probably have this
broken into at least 5, maybe even 8 patches.  There's a lot of
stuff here, and some of it was completely unrelated to each other.
Like a fix for lseek arguments?  Uhh, that's so not what this patch
was about, but is still a good fix.  Nice catch.  Lets get it in,
but properly please.

I also noticed a lot of "/* and here we could do... */" markers.
I think these belong more in the commit message than in the code
itself, as the code gets really cluttered after a while with the
todo markers that nobody has done yet.  Odds are, if you don't do
it in this series, it won't get done for quite a while, if ever.
Mention it in your commit message, so others are aware of it,
and leave it at that.

> On Wed, 4 Apr 2007, Dana How wrote:
> > I have a similar but much weaker reaction, but Linus specifically asked for
> > this combination to work.
> 
> Linus is a total imcompetent who knows nothing about programming or good 
> taste.  So never ever listen to what he says.  He is wrong, always.

Now now, I wouldn't go that far...

> And in this case he's more wrong than usual.

But yes, in this case I certainly I agree with you Nico.  ;-)

> The appropriate location for the splitting of packs in a fetch 
> context is really within index-pack not in pack-objects.  And it is 
> probably so much easier to do in index-pack anyway.

Or in a push context.  I have started down the road of doing the
pack splitting in index-pack, but never was able to finish it.
It is totally doable in index-pack, but it turned out to be more
work than I had time for when I started it.

I still have the topic laying in my repository, and it is pushed
to my fastimport.git fork on repo.or.cz, if someone wants to take
a look at it.  The patch is far from complete, hence no patch to
mailing list.

> > But blobs even close to the packfile limit don't seem all that useful
> > to pack either (this of course is a weaker argument).
> 
> So you still have to convince me this is a useful feature.

Me too.  Nico's right.  And lets just assume we are in a bad case
where we need to put one (and only one) such blob into each packfile.
The packfile on disk will be an additional 32 bytes larger than if
it were to stay a loose file, and that's assuming the loose file was
created using the newer-style loose file format.  This is unlikely
to cause a large disk space overhead.

Yea, we to have an extra SHA-1 we have to compute when we created
the file, but then we also have another SHA-1 to verify that the
zlib stream is not corrupt, *without* unpacking the zlib stream.
That in and of itself could be useful for an fsck, we could do a
faster rule that says "if there's only a couple of blobs in this
packfile, and the thing is *big*, just do a check of the packfile
SHA-1 and assume the blobs are OK".  So there actually might be
value to that extra SHA-1 after all.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html