On Mon, 26 Nov 2007, Dana How wrote: > On Nov 26, 2007 11:52 AM, Nicolas Pitre <nico@xxxxxxx> wrote: > > On Mon, 26 Nov 2007, Dana How wrote: > > > Currently data can be quickly copied from pack to pack, > > > but data cannot be quickly copied blob->pack or pack->blob > > I don't see why you would need the pack->blob copy normally. > True, but that doesn't change the main point. Sure, but let's not go overboard either. > > > (there was an alternate blob format that supported this, > > > but it was deprecated). Using the pack format for blobs > > > would fix this. > > > > Then you can do just that for big enough blobs where "big enough" is > > configurable: encapsulate them in a pack instead of a loose object. > > Problem solved. Sure you'll end up with a bunch of packs containing > > only one blob object, but given that those blobs are so large to be a > > problem in your work flow when written out as loose objects, then they > > certainly must be few enough not to cause an explosion in the number of > > packs. > Are you suggesting that "git add" create a new pack containing > one blob when the blob is big enough? Exactly. > Re-using (part of) the pack format > in a blob (or maybe only some blobs) seems like less code change. Don't know what you mean exactly here, but what I mean is to do something as simple as: pretend_sha1_file(...); add_object_entry(...); write_pack_file(); when the buffer to make a blob from is larger than a configured treshold. > > > It would also mean blobs wouldn't need to > > > be uncompressed to get the blob type or size I believe. > > > > They already don't. > It looks like sha1_file.c:parse_sha1_header() works on a buffer > filled in by sha1_file.c:unpack_sha1_header() by calling inflate(), right? > > It is true you don't have to uncompress the *entire* blob. Right. Only the first 16 bytes or so need to be uncompressed. > > > The equivalent operation in git would require the creation of > > > the blob, and then of a temporary pack to send to the server. > > > This requires 3 calls to zlib for each blob, which for very > > > large files is not acceptable at my site. > > > > I currently count 2 calls to zlib, not 3. > I count 3: > > Call 1: git-add calls zlib to make the blob. > > Call 2: builtin-pack-objects.c:write_one() calls sha1_file.c:read_sha1_file() > calls :unpack_sha1_file() calls :unpack_sha1_{header,rest}() calls > inflate() to get the data from the blob into a buffer. > > Call 3: Then write_one() calls deflate to make the new buffer > to write into the pack. This is all under the "if (!to_reuse) {" path, > which is active when packing a blob. Oh, you're right. Somehow I didn't count the needed decompression. > Remember, I'm comparing "p4 submit file" to > "git add file"/"git commit"/"git push", which is the comparison > the users will be making. > > On the other hand, I'm looking at code from June; > but I haven't noticed big changes since then on the list. > > Calls 2 and 3 go away if the blob and pack formats were more similar. ... which my suggestion should provide with a minimum of changes, maybe less than 10 lines of code. Nicolas - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html