On Mon, Mar 30, 2020 at 10:03:09AM -0400, Jeff King wrote: > The oid_array object uses an "int" to store the number of items and the > allocated size. It's rather unlikely for somebody to have more than 2^31 > objects in a repository (the sha1's alone would be 40GB!), but if they > do, we'd overflow our alloc variable. > > You can reproduce this case with something like: > > git init repo > cd repo > > # make a pack with 2^24 objects > perl -e ' > my $nr = 2**24; > > for (my $i = 0; $i < $nr; $i++) { > print "blob\n"; > print "data 4\n"; > print pack("N", $i); > } > ' | git fast-import > > # now make 256 copies of it; most of these objects will be duplicates, > # but oid_array doesn't de-dup until all values are read and it can > # sort the result. > cd .git/objects/pack/ > pack=$(echo *.pack) > idx=$(echo *.idx) > for i in $(seq 0 255); do > # no need to waste disk space > ln "$pack" "pack-extra-$i.pack" > ln "$idx" "pack-extra-$i.idx" > done > > # and now force an oid_array to store all of it > git cat-file --batch-all-objects --batch-check > > which results in: > > fatal: size_t overflow: 32 * 18446744071562067968 > > So the good news is that st_mult() sees the problem (the large number is > because our int wraps negative, and then that gets cast to a size_t), > doing the job it was meant to: bailing in crazy situations rather than > causing an undersized buffer. > > But we should avoid hitting this case at all, and instead limit > ourselves based on what malloc() is willing to give us. We can easily do > that by switching to size_t. > > The cat-file process above made it to ~120GB virtual set size before the > integer overflow (our internal hash storage is 32-bytes now in > preparation for sha256, so we'd expect ~128GB total needed, plus > potentially more to copy from one realloc'd block to another)). After > this patch (and about 130GB of RAM+swap), it does eventually read in the > whole set. No test for obvious reasons. ;). This patch looks good, and makes immediate sense given your explanation. > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > sha1-array.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/sha1-array.h b/sha1-array.h > index dc1bca9c9a..c5e4b9324f 100644 > --- a/sha1-array.h > +++ b/sha1-array.h > @@ -49,8 +49,8 @@ > */ > struct oid_array { > struct object_id *oid; > - int nr; > - int alloc; > + size_t nr; > + size_t alloc; > int sorted; > }; > > -- > 2.26.0.597.g7e08ed78ff Thanks, Taylor