Re: [PATCH v3 0/5] unpack large objects in stream

Han Xin <chiyutianyi@xxxxxxxxx> · Mon, 29 Nov 2021 15:01:47 +0800

Han Xin <chiyutianyi@xxxxxxxxx> writes:
>
> From: Han Xin <hanxin.hx@xxxxxxxxxxxxxxx>
>
> Although we do not recommend users push large binary files to the git repositories,
> it's difficult to prevent them from doing so. Once, we found a problem with a surge
> in memory usage on the server. The source of the problem is that a user submitted
> a single object with a size of 15GB. Once someone initiates a git push, the git
> process will immediately allocate 15G of memory, resulting in an OOM risk.
>
> Through further analysis, we found that when we execute git unpack-objects, in
> unpack_non_delta_entry(), "void *buf = get_data(size);" will directly allocate
> memory equal to the size of the object. This is quite a scary thing, because the
> pre-receive hook has not been executed at this time, and we cannot avoid this by hooks.
>
> I got inspiration from the deflate process of zlib, maybe it would be a good idea
> to change unpack-objects to stream deflate.
>

Hi, Jeff.

I hope you can share with me how Github solves this problem.

As you said in your reply at：
https://lore.kernel.org/git/YVaw6agcPNclhws8@xxxxxxxxxxxxxxxxxxxxxxx/
"we don't have a match in unpack-objects, but we always run index-pack
on incoming packs".

In the original implementation of "index-pack", for objects larger than
big_file_threshold, "fixed_buf" with a size of 8192 will be used to
complete the calculation of "oid".

I tried the implementation in jk/no-more-unpack-objects, as you noted:
  /* XXX This will expand too-large objects! */
  if (!data)
  data = new_data = get_data_from_pack(obj_entry);
If the conditions of --unpack are given, there will be risks here.
When I create an object larger than 1GB and execute index-pack, the
result is as follows:
  $GIT_ALLOC_LIMIT=1024m git index-pack --unpack --stdin <large.pack
  fatal: attempting to allocate 1228800001 over limit 1073741824

Looking forward to your reply.