Jeff King <peff@xxxxxxxx> writes: > On Thu, Jan 31, 2013 at 09:03:26AM -0800, Shawn O. Pearce wrote: > ... >> If we are going to change the index to support extension sections and >> I have to modify JGit to grok this new format, it needs to be index v3 >> not index v2. If we are making index v3 we should just put index v3 on >> the end of the pack file. > > I'm not sure what you mean by your last sentence here. I am not Shawn, but here is a summary of what I think I discussed with him in person, lest I forget. You could imagine that a new pack system (from pack-objects, index-pack down to read_packed_sha1() call) that works with a packfile that * is a single file, whose name is pack-$SHA1.$sfx (where $sfx is something other than 'pack', perhaps); * has the pack data stream, including the concluding checksum of the stream contents at the end, at the beginning of the file; and * has the index v3 data blob appended to the pack data stream. The pack data is streamed over the wire exactly the same way, interoperating with existing software. When receiving, the new index-pack can read such a pack stream and add index at the end. When re-indexing an existing pack (think: upgrading existing packfiles from the current system), the index-pack can read from the packfile and do what it does currently (notably, it knows where the pack stream ends and can stop reading at that point, so even if you feed the new pack to it, it will stop at the end of the pack data, ignoring the index v3 already at the end of the input). One potential advantage of using a single file, instead of the primary .pack file with 3 (or 47) auxiliary files, is that it lets you repack without having to deal with this sequence, which happens currently when you repack: * create a new .pack file and the corresponding auxiliary files under temporary filename; * move existing pack files that describe the same set of objects away; * rename these new files, one at a time, to their final name, making sure that you rename .idx the last, because that happens to be the key to the pack aware programs. Instead you can rename only one thing (the new one) to the final name (possibly atomically replacing the existing one). With the current system, when you need to replace a pack with a new pack with the same packname (e.g. you repack everything with a better pack parameter in a repository that has everything packed into one), there is a very small window other concurrent users will not find the object data between the time when you rename the old ones away and the time when you move the new ones in. The hairly logic between "Ok we have prepared all new packfiles" and "End of pack replacement" can be done with a single rename(2) of the new packfile (which contains everything) to the final name, which atomically replaces the old one. This will become even safer if we picked $SHA1 (the name of the packfile) to represent the hash of the whole thing, not the hash of the sorted object names in the pack, as that will let us know there is no need to even "replace" the files. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html