On Tue, May 27, 2014 at 11:47 PM, Dale R. Worley <worley@xxxxxxxxxxxx> wrote: > I've discovered a problem using Git. It's not clear to me what the > "correct" behavior should be, but it seems to me that Git is failing > in an undesirable way. > > The problem arises when trying to handle a very large file. For > example: > > $ git --version > git version 1.8.3.1 > $ mkdir $$ > $ cd $$ > $ git init > Initialized empty Git repository in /common/not-replicated/worley/temp/5627/.git/ > $ truncate --size=20G big_file > $ ls -l > total 0 > -rw-rw-r--. 1 worley worley 21474836480 May 27 11:59 big_file > $ time git add big_file > > real 4m48.752s > user 4m31.295s > sys 0m16.747s > $ > > At this point, either 'git fsck' or 'git commit' fails: > > $ git fsck --full --strict > notice: HEAD points to an unborn branch (master) > Checking object directories: 100% (256/256), done. > fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes) Back trace for this one #3 0x000000000055cf39 in xmalloc (size=21474836481) at wrapper.c:49 #4 0x000000000055cffd in xmallocz (size=21474836480) at wrapper.c:73 #5 0x0000000000537858 in unpack_compressed_entry (p=0x858ac0, w_curs=0x7fffffffc0f8, curpos=18, size=21474836480) at sha1_file.c:1924 #6 0x0000000000538364 in unpack_entry (p=0x858ac0, obj_offset=12, final_type=0x7fffffffc1e4, final_size=0x7fffffffc1d8) at sha1_file.c:2206 #7 0x00000000004fb0a2 in verify_packfile (p=0x858ac0, w_curs=0x7fffffffc320, fn=0x43f5f2 <fsck_obj_buffer>, progress=0x858a90, base_count=0) at pack-check.c:119 #8 0x00000000004fb3f4 in verify_pack (p=0x858ac0, fn=0x43f5f2 <fsck_obj_buffer>, progress=0x858a90, base_count=0) at pack-check.c:177 #9 0x00000000004401d7 in cmd_fsck (argc=0, argv=0x7fffffffd650, prefix=0x0) at builtin/fsck.c:677 Not easy to fix. I started working on converting fsck to use index-pack code for pack verification. index-pack supports large files well, so in the end it might fix this (as well as speeding up fsck). But that work has stalled for a long time. > > $ git commit -m Test. > [master (root-commit) 3df3655] Test. > fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes) And back trace #11 0x00000000004b9da0 in read_sha1_file (sha1=0x8558a0 "\256/s\324\370\304\344\212\304I\v\342\334MS\002\352\214\061\222", type=0x7fffffffc6c4, size=0x8558d0) at cache.h:820 #12 0x00000000004c1b98 in diff_populate_filespec (s=0x8558a0, size_only=0) at diff.c:2749 #13 0x00000000004c0110 in diff_filespec_is_binary (one=0x8558a0) at diff.c:2188 #14 0x00000000004c0f0b in builtin_diffstat (name_a=0x858530 "big_file", name_b=0x0, one=0x8584e0, two=0x8558a0, diffstat=0x7fffffffc8a0, o=0x7fffffffce88, p=0x855910) at diff.c:2435 #15 0x00000000004c2fd4 in run_diffstat (p=0x855910, o=0x7fffffffce88, diffstat=0x7fffffffc8a0) at diff.c:3168 #16 0x00000000004c603a in diff_flush_stat (p=0x855910, o=0x7fffffffce88, diffstat=0x7fffffffc8a0) at diff.c:4081 #17 0x00000000004c70e4 in diff_flush (options=0x7fffffffce88) at diff.c:4520 #18 0x00000000004e5d59 in log_tree_diff_flush (opt=0x7fffffffcaf0) at log-tree.c:715 #19 0x00000000004e5e5a in log_tree_diff (opt=0x7fffffffcaf0, commit=0x8585b0, log=0x7fffffffc9a0) at log-tree.c:747 #20 0x00000000004e60b1 in log_tree_commit (opt=0x7fffffffcaf0, commit=0x8585b0) at log-tree.c:810 #21 0x000000000042c45c in print_summary (prefix=0x0, sha1=0x7fffffffd300 ".&Gȑ\360\243\202\351&!\035\312q\374\345\314LL)", initial_commit=1) at builtin/commit.c:1426 #22 0x000000000042d213 in cmd_commit (argc=0, argv=0x7fffffffd650, prefix=0x0) at builtin/commit.c:1750 If we could have an option in read_sha1_file to read max to <n> bytes (enough for binary detection purpose), it would fix this. Another option is declare all files larger than core.bigfilethreshold binary. Easier in both senses of implementation cost and looseness. > Even doing a 'git reset' does not put the repository in a state where > 'git fsck' will complete: > > $ git reset > $ git fsck --full --strict > notice: HEAD points to an unborn branch (master) > Checking object directories: 100% (256/256), done. > fatal: Out of memory, malloc failed (tried to allocate 21474836481 bytes) I don't know how many commands are hit by this. If you have time and gdb, please put a break point in die_builtin() function and send backtraces for those that fail. You could speed up the process by creating a smaller file and set the environment variable GIT_ALLOC_LIMIT (in kilobytes) to a number lower than that size. If git attempts to allocate a block larger than that limit it'll die. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html