On Wed, Mar 21 2018, Duy Nguyen wrote: > On Wed, Mar 21, 2018 at 9:24 AM, Jeff King <peff@xxxxxxxx> wrote: >> On Sun, Mar 18, 2018 at 03:25:15PM +0100, Nguyễn Thái Ngọc Duy wrote: >> >>> v6 fixes the one optimization that I just couldn't get right, fixes >>> two off-by-one error messages and a couple commit message update >>> (biggest change is in 11/11 to record some numbers from AEvar) >> >> I was traveling during some of the earlier rounds, so I finally got a >> chance to take a look at this. >> >> I hate to be a wet blanket, but am I the only one who is wondering >> whether the tradeoffs is worth it? 8% memory reduction doesn't seem >> mind-bogglingly good, > > AEvar measured RSS. If we count objects[] array alone, the saving is > 40% (136 bytes per entry down to 80). Some is probably eaten up by > mmap in rss. Yeah, sorry about spreading that confusion. >> and I'm concerned about two things: >> >> 1. The resulting code is harder to read and reason about (things like >> the DELTA() macros), and seems a lot more brittle (things like the >> new size_valid checks). >> >> 2. There are lots of new limits. Some of these are probably fine >> (e.g., the cacheable delta size), but things like the >> number-of-packs limit don't have very good user-facing behavior. >> Yes, having that many packs is insane, but that's going to be small >> consolation to somebody whose automated maintenance program now >> craps out at 16k packs, when it previously would have just worked >> to fix the situation. >> >> Saving 8% is nice, but the number of objects in linux.git grew over 12% >> in the last year. So you've bought yourself 8 months before the problem >> is back. Is it worth making these changes that we'll have to deal with >> for many years to buy 8 months of memory savings? > > Well, with 40% it buys us a couple more months. The object growth > affects rev-list --all too so the actual "good months" is probably not > super far from 8 months. > > Is it worth saving? I don't know. I raised the readability point from > the very first patch and if people believe it makes it much harder to > read, then no it's not worth it. > > While pack-objects is simple from the functionality point of view, it > has received lots of optimizations and to me is quite fragile. > Readability does count in this code. Fortunately it still looks quite > ok to me with this series applied (but then it's subjective) > > About the 16k limit (and some other limits as well), I'm making these > patches with the assumption that large scale deployment probably will > go with custom builds anyway. Adjusting the limits back should be > quite easy while we can still provide reasonable defaults for most > people. > >> I think ultimately to work on low-memory machines we'll need a >> fundamentally different approach that scales with the objects since the >> last pack, and not with the complete history. > > Absolutely. Which is covered in a separate "gc --auto" series. Some > memory reduction here may be still nice to have though. Even on beefy > machine, memory can still be reused somewhere other than wasted in > unused bits. FWIW I've been running a combination of these two at work (also keeping the big pack), and they've had a sizable impact on packing our monorepo, on one of our dev boxes on a real-world checkout with a combo of the "base" pack and other packs + loose objects, as measured by /usr/bin/time * Reduction in user time by 37% * Reduction in system time by 84% * Reduction in RSS by 61% * Reduction in page faults by 58% & 94% (time(1) reports two different numbers) * Reduction in the I of I/O by 58% * Reduction in the O of I/O by 94%