Re: [PATCH 09/13] drop objects larger than --blob-limit if specified

"Dana How" <danahow@xxxxxxxxx> · Fri, 6 Apr 2007 11:09:16 -0700

On 4/6/07, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
On Thu, 5 Apr 2007, Nicolas Pitre wrote:
> > (2) Pack the object any way and let the packfile size exceed
> >     my specification.  Ignoring a clear preference from the user
> >     doesn't seem good.
> It is not indeed.

Well, I think there is an easy solution.

Just go back and say that when the user limits the size, it limits the
offset at which objects can *start*.
Yes,  this is what my original, unsplit patch did in order to support
the --pack-limit && --stdout combination, since lseek doesn't
always work on stdout.

Not only is that the only thing that the index file itself cares about, it
also means that

 - you don't ever need to undo anything at all (because you know the
   starting offset) when you begin packing a new object.
Yes, that was true.

   This should simplify your patch a lot.
It removes the calls to, and the additions of, sha1mark() and sha1undo()
in csum-file.[ch].  The changes to builtin-pack-objects.c are almost
the same -- an "if" moves from after the write_object() to before.

 - the object size issue just goes away. Sure, the pack-file limit looks a
   bit strange (it's no longer a hard limit on the *size* of the
   pack-file, just on the object offsets), but let's face it, we really
   really don't care.

And in practice, by setting the pack-file limit to 2GB (or even 1GB), you
never ever have to worry about the 32-bit filesystem limits any more,
unless your repository is fundamentally so screwed that you simply
*cannot* reporesent it well on something like FATFS (ie any object that is
2GB in size will probably have a blob that is even bigger, and FATFS
already has problems with it).

So in practice, just limiting the index offsets is what you really want
anyway.
I agree with your arguments, but packs have different uses
and to me there are several differing (non-)reasons for --pack-limit.
* To split packs into smaller chunks when the .pack format is used as a
 communications protocol. But the discussion has converged to
 "we're not going to split packs at the transmitter". I agree with this
 conclusion,  since it would make the total transmission larger (loss
of deltas).
 Since no .idx file is sent there is no requirement for --pack-limit here.
* To avoid offsets larger than 31 bits in .idx files.  Your proposal,
 and what I was doing for --pack-limit && --stdout, is sufficient to
address this.
* Avoiding (e.g.) 2GB+ files when none already exist in the repository --
 either the filesystem doesn't support anything beyond the limit,
 or we don't want to use a >31b off_t with mmap.  (Perhaps
 the latter case is completely avoided by some glibc 64b trickery,
 but is that always true?)  Only the write rollback approach can address this.
* Creating .pack files that fit on e.g. CDROM etc.
The 2nd and 3rd cases are what I'm thinking about,
which is why the first version of my patch did both.

Anyway, now that I understand Nicolas's issues with the patchset,
and realize I agree with his concerns, I'm going to let this percolate
a little bit until I've got the least number of added behaviors and
command line options.  At the moment I'm leaning towards killing my
--blob-limit idea, keeping --pack-limit based on rollback but with the
additional twist that the first object written to a packfile is never
"rolled back" [meaning (a) everything is packed to make Nicolas happy,
(b) any illegally-sized pack has only one object and could be removed
to make me happy, and (c) my patchset has one less bug], and
adding --index-limit which is what normal people might use on a large
repository [and could be made the default later].  So I think what you
discuss is the most common use for limiting.

Thanks,
--
Dana L. How  danahow@xxxxxxxxx  +1 650 804 5991 cell
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html