Jon Smirl <jonsmirl@xxxxxxxxx> wrote: > How about adding a flag to repack that simply says delete the objects > when done with them? I'd still create all of the objects on disk. > Repack would assume that they have at least been sorted by filename. > So repack could read in object names until it sees a change in the > file name, sort them by size, deltafy, write out the pack and then > delete the objects from that batch. Then repeat this process for the > next file name on stdin. > > I'm making two assumptions, first that blocks from a deleted file > don't get written to disk. And that by deleting the file the file > system will use the same blocks over and over. If those assumptions > are close to being true then the cache shouldn't thrash. They don't > have to be totally true, close is good enough. > > Of course eliminating the files all together will be even faster. See the email I just sent you. The only file being written is the pack file that's being generated. No temporary files, no temporary inodes, no temporary blocks. Only two passes over the data: one to write it out and a second to generate the SHA1. I do two passes vs. keep it all in memory to prevent the program from blowing out on extremely large inputs. It may be possible to tweak git-pack-objects to get what you propose above, but to be honest I think the git-fast-import I just sent was easier, especially as it avoids the temporary loose object stage. -- Shawn. - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html