Re: Surprising use of memory and time when repacking mozilla's gecko repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 04, 2019 at 07:05:30PM +0900, Mike Hommey wrote:
> Hi,
> 
> I was looking at the disk size of the gecko repository on github[1],
> which started at 4.7GB, and `git gc --aggressive`'d it, which made that
> into 2.0G. But to achieve that required quite some resources.
> 
> My first attempt failed with OOM, on an AWS instance with 16 cores and
> 32GB RAM. I then went to another AWS instance, with 36 cores and 96GB
> RAM. And that went through after a while... with a peak memory usage
> above 60GB!
> 
> Since then, Peff kindly repacked the repo on the github end, so it
> doesn't really need repacking locally anymore, but I can still reproduce
> the > 60GB memory usage with the packed repository.
> 
> I gathered some data[2], all on the same 36 cores, 96GB RAM instance, with
> 36, 16 and 1 threads, and here's what can be observed:
> 
> With 36 threads, the overall process takes 45 minutes:
> - 50 seconds enumerating and counting objects.
> - ~22 minutes compressing objects
> - ~22 minutes writing objects
> 
> Of the 22 minutes compressing objects, more than 15 minutes are spent on
> the last percent of objects, and only during that part the memory usage
> balloons above 20GB.
> 
> Memory usage goes back to 2.4G after finishing to compress.
> 
> With 16 threads, the overall process takes about the same time as above,
> with about the same repartition.
> 
> But less time is spent on compressing the last percent of objects, and
> memory usage goes above 20GB later than with 36 threads.
> 
> Finally, with 1 thread, the picture changes greatly. The overall process
> takes 2.5h:
> - 50 seconds enumerating and counting objects.
> - ~2.5h compressing objects.
> - 3 minutes and 25 seconds writing objects!
> 
> Memory usage stays reasonable, except at some point after 47 minutes,
> where it starts to increase up to 12.7GB, and then goes back down about
> half an hour later, all while stalling around the 13% progress mark.
> 
> My guess is all those stalls are happening when processing the files I
> already had problems with in the past[3], except there are more of them
> now (thankfully, they were removed, so there won't be more, but that
> doesn't make the existing ones go away).
> 
> I never ended up working on trying to make that diff faster, maybe that
> would help a little here, but that would probably not help much wrt the
> memory usage. I wonder what git could reasonably do to avoid OOMing in
> this case. Reduce the window size temporarily? Trade memory with time,
> by not keeping the objects in memory?
> 
> I'm puzzled by the fact writing objects is so much faster with 1 thread.

Here's a perf report from the portion of "Writing" that is particularly
slow with compression having happened on 36 threads:
  100.00%     0.00%  git      [unknown]           [k] 0xffffffffffffffff                    
   99.97%     0.02%  git      git                 [.] write_one                             
   99.97%     0.00%  git      git                 [.] write_pack_file                       
   99.97%     0.00%  git      git                 [.] cmd_pack_objects                      
   99.96%     0.00%  git      git                 [.] write_object (inlined)                
   99.96%     0.00%  git      git                 [.] write_reuse_object (inlined)          
   99.92%     0.00%  git      git                 [.] write_no_reuse_object                 
   98.12%     0.00%  git      git                 [.] get_delta (inlined)                   
   72.36%     0.00%  git      git                 [.] diff_delta (inlined)                  
   64.86%    64.20%  git      git                 [.] create_delta_index                    
   26.32%     0.00%  git      git                 [.] repo_read_object_file (inlined)       
   26.32%     0.00%  git      git                 [.] read_object_file_extended             
   26.32%     0.00%  git      git                 [.] read_object                           
   26.32%     0.00%  git      git                 [.] oid_object_info_extended              
   26.25%     0.00%  git      git                 [.] packed_object_info                    
   26.24%     0.00%  git      git                 [.] cache_or_unpack_entry (inlined)       
   24.30%     0.01%  git      git                 [.] unpack_entry                          
   17.62%     0.00%  git      git                 [.] memcpy (inlined)                      
   17.52%    17.46%  git      libc-2.27.so        [.] __memmove_avx_unaligned_erms          
   15.98%     0.22%  git      git                 [.] patch_delta                           
    7.60%     0.00%  git      git                 [.] unpack_compressed_entry               
    7.49%     7.42%  git      git                 [.] create_delta                          
    7.29%     0.00%  git      git                 [.] git_inflate                           
    7.29%     0.23%  git      libz.so.1.2.11      [.] inflate                               
    1.94%     0.00%  git      git                 [.] xmemdupz                              
    1.14%     0.00%  git      git                 [.] do_compress                           
    0.98%     0.98%  git      libz.so.1.2.11      [.] adler32_z                             
    0.95%     0.00%  git      libz.so.1.2.11      [.] deflate                               

... that's a large portion of time spent on deltas...

Mike



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux