On Fri, Oct 01, 2021 at 10:30:24AM +0800, Jiang Xin wrote: > > Maybe an unfair knee-jerk reaction: I think we should really be pushing > > this sort of thing into pre-receive hooks and/or the proc-receive hook, > > i.e. see 15d3af5e22e (receive-pack: add new proc-receive hook, > > 2020-08-27). > > Last week, one user complained that he cannot push to his repo in our > server, and later Han Xin discovered the user was trying to push a > very big blob object over 10GB. For this case, the "pre-receive" hook > had no change to execute because "git-receive-pack" died early because > of OOM. The function "unpack_non_delta_entry()" in > "builtin/unpack-objects.c" will try to allocate memory for the whole > 10GB blob but no lucky. > > Han Xin is preparing another patch to resolve the OOM issue found in > "unpack_non_delta_entry()". But we think it is reasonable to prevent > such a big blob in a pack to git-receive-pack, because it will be > slower to check objects from pack and loose objects in the quarantine > using pre-receive hook. I think index-pack handles this case correctly, at least for base objects. In unpack_entry_data(), it will stream anything bigger than big_file_threshold. Probably unpack-objects needs to learn the same trick. In general, the code in index-pack has gotten a _lot_ more attention over the years than unpack-objects. I'd trust its security a lot more, and it has extra performance enhancements (like multithreading and streaming). At GitHub, we always run index-pack on incoming packs, and never unpack-objects. I'm tempted to say we should stop using unpack-objects entirely for incoming packs, and then either: - just keep packs for incoming objects. It's not really any worse of a state than loose objects, and it all eventually gets rolled up by gc anyway. (The exception is that thin packs may duplicate base objects until that rollup). - teach index-pack an "--unpack" option. I actually wrote some patches for this a while back. It's not very much code, but there were some rough edges and I never came back to them. I'm happy to share if anybody's interested. Though I would note that for deltas, even index-pack will not stream the contents. You generally shouldn't have very large deltas, as clients will also avoid computing them (because that also involves putting the whole object in memory). But the notion of "large" is not necessarily the same between client and server. And of course somebody can maliciously make a giant delta. -Peff