On 4/6/07, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
On Thu, 5 Apr 2007, Nicolas Pitre wrote: > > (2) Pack the object any way and let the packfile size exceed > > my specification. Ignoring a clear preference from the user > > doesn't seem good. > It is not indeed. Well, I think there is an easy solution. Just go back and say that when the user limits the size, it limits the offset at which objects can *start*.
Yes, this is what my original, unsplit patch did in order to support the --pack-limit && --stdout combination, since lseek doesn't always work on stdout.
Not only is that the only thing that the index file itself cares about, it also means that - you don't ever need to undo anything at all (because you know the starting offset) when you begin packing a new object.
Yes, that was true.
This should simplify your patch a lot.
It removes the calls to, and the additions of, sha1mark() and sha1undo() in csum-file.[ch]. The changes to builtin-pack-objects.c are almost the same -- an "if" moves from after the write_object() to before.
- the object size issue just goes away. Sure, the pack-file limit looks a bit strange (it's no longer a hard limit on the *size* of the pack-file, just on the object offsets), but let's face it, we really really don't care. And in practice, by setting the pack-file limit to 2GB (or even 1GB), you never ever have to worry about the 32-bit filesystem limits any more, unless your repository is fundamentally so screwed that you simply *cannot* reporesent it well on something like FATFS (ie any object that is 2GB in size will probably have a blob that is even bigger, and FATFS already has problems with it). So in practice, just limiting the index offsets is what you really want anyway.
I agree with your arguments, but packs have different uses and to me there are several differing (non-)reasons for --pack-limit. * To split packs into smaller chunks when the .pack format is used as a communications protocol. But the discussion has converged to "we're not going to split packs at the transmitter". I agree with this conclusion, since it would make the total transmission larger (loss of deltas). Since no .idx file is sent there is no requirement for --pack-limit here. * To avoid offsets larger than 31 bits in .idx files. Your proposal, and what I was doing for --pack-limit && --stdout, is sufficient to address this. * Avoiding (e.g.) 2GB+ files when none already exist in the repository -- either the filesystem doesn't support anything beyond the limit, or we don't want to use a >31b off_t with mmap. (Perhaps the latter case is completely avoided by some glibc 64b trickery, but is that always true?) Only the write rollback approach can address this. * Creating .pack files that fit on e.g. CDROM etc. The 2nd and 3rd cases are what I'm thinking about, which is why the first version of my patch did both. Anyway, now that I understand Nicolas's issues with the patchset, and realize I agree with his concerns, I'm going to let this percolate a little bit until I've got the least number of added behaviors and command line options. At the moment I'm leaning towards killing my --blob-limit idea, keeping --pack-limit based on rollback but with the additional twist that the first object written to a packfile is never "rolled back" [meaning (a) everything is packed to make Nicolas happy, (b) any illegally-sized pack has only one object and could be removed to make me happy, and (c) my patchset has one less bug], and adding --index-limit which is what normal people might use on a large repository [and could be made the default later]. So I think what you discuss is the most common use for limiting. Thanks, -- Dana L. How danahow@xxxxxxxxx +1 650 804 5991 cell - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html