On Fri, Sep 29, 2017 at 11:43:57PM +0200, Johannes Schindelin wrote: > On Thu, 28 Sep 2017, Jeff King wrote: > > > If you're planning on using an oidset to mark every object in a > > 100-million-object monorepo, we'd probably care more. But I'd venture to > > say that any scheme which involves generating that hash table on the fly > > is doing it wrong. At at that scale we'd want to look at compact > > mmap-able on-disk representations. > > Or maybe you would look at a *not-so-compact* mmap()able on-disk > representation, to allow for painless updates. > > You really will want to avoid having to write out large files just because > a small part of them changed. We learn that lesson the hard way, from > having to write 350MB worth of .git/index for every single, painful `git > add` operation. Sure. I didn't mean to start designing the format. I just mean that if the first step of the process is "read information about all 100 million objects into an in-RAM hashmap", then that is definitely not going to fly. -Peff