Re: fatal: Out of memory, getdelim failed under NFS mounts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 10.08.2017 um 16:43 schrieb Yaroslav Halchenko:
> On Thu, 10 Aug 2017, René Scharfe wrote:
>> Am 09.08.2017 um 19:39 schrieb Yaroslav Halchenko:
>>> More context (may be different issue(s)) could be found at
>>> http://git-annex.branchable.com/forum/git-annex_add_out_of_memory_error/
>>> but currently I am consistently reproducing it while running
>>> git (1:2.11.0-3 debian stretch build) within debian stretch singularity
>>> environment [1].
> 
>>> External system is Centos 6.9, and git 1.7.1 (and installed in modules
>>> 2.0.4) do not show similar buggy behavior.
> 
>>> NFS mounted partitions are bind mounted inside the sinularity space and
>>> when I try to do some git operations, I get that error inconsistently , e.g.
> 
>>> 	yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin master
>>> 	fatal: Out of memory, getdelim failed
>>> 	error: git://github.com/datalad/datalad did not send all necessary objects
> 
>>> 	yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin master
>>> 	fatal: Out of memory, getdelim failed
>>> 	error: git://github.com/datalad/datalad did not send all necessary objects
> 
>>> 	yhalchen@discovery:/mnt/scratch/yoh/datalad$ git pull --ff-only origin master
>>> 	From git://github.com/datalad/datalad
>>> 	 * branch              master     -> FETCH_HEAD
>>> 	fatal: Out of memory, getdelim failed
> 
>>> and some times it succeeds.  So it smells that some race condition
>>> somewhere...?
> 
>> I doubt the type of file system matters.
> 
> So far it has been a very consistent indicator.  I did not manage to get
> this error while performing the same operation under /tmp (bind to local
> mounted drive), where it also feels going faster (again suggesting that
> original issue is some kind of a race)

Well, there have been bugs in getdelim() before, e.g.:

  https://bugzilla.redhat.com/show_bug.cgi?id=601071
  https://bugzilla.redhat.com/show_bug.cgi?id=1332917

git v2.5.0 was the first version to use it.   So if all else fails it may
be worth compiling git without HAVE_GETDELIM.

>> The questions are: How much
>> main memory do you have, what is git trying to cram into it, is there
>> a way to reduce the memory footprint or do you need to add more RAM?
>> ... reordered ...
>> "free" and "ulimit -a" can help you find out how much memory you can
>> use.
> 
> I think those aren't the reason:
> 
> yhalchen@discovery:/mnt/scratch/yoh/datalad$ free -h
>                total        used        free      shared  buff/cache   available
> Mem:           126G        2.5G         90G        652K         33G        123G
> Swap:          127G        1.7M        127G

Is all of that available to the git in the Singularity container or
is that the memory size of the host and there's some kind of limit
for the guests?

> yhalchen@discovery:/mnt/scratch/yoh/datalad$ ulimit
> unlimited

That's just the maximum file size; memory-related limits are more
interesting for this case.  "ulimit -a" will show all limits.

>>> any recommendations on how to pin point the "offender"? ;)
>> Running "GIT_TRACE=1 git pull --ff-only origin master" would be a
>> good start, I think, to find out which of the different activities
>> that pull is doing causes the out-of-memory error.
> 
> samples of bad, and then good runs (from eyeballing -- the same until
> error message):
> 
> yhalchen@discovery:/mnt/scratch/yoh$ cat git_trace_bad.log
> 14:05:25.782270 git.c:371               trace: built-in: git 'pull' '--ff-only' 'origin' 'master'
> 14:05:25.795036 run-command.c:350       trace: run_command: 'fetch' '--update-head-ok' 'origin' 'master'
> 14:05:25.795332 exec_cmd.c:116          trace: exec: 'git' 'fetch' '--update-head-ok' 'origin' 'master'
> 14:05:25.797212 git.c:371               trace: built-in: git 'fetch' '--update-head-ok' 'origin' 'master'
> 14:05:25.904088 run-command.c:350       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
> 14:05:26.085954 run-command.c:350       trace: run_command: 'index-pack' '--stdin' '--fix-thin' '--keep=fetch-pack 11652 on discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
> 14:05:26.086333 exec_cmd.c:116          trace: exec: 'git' 'index-pack' '--stdin' '--fix-thin' '--keep=fetch-pack 11652 on discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
> 14:05:26.088382 git.c:371               trace: built-in: git 'index-pack' '--stdin' '--fix-thin' '--keep=fetch-pack 11652 on discovery.hpcc.dartmouth.edu' '--pack_header=2,103'
> 14:05:26.133326 run-command.c:350       trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
> 14:05:26.133688 exec_cmd.c:116          trace: exec: 'git' 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
> 14:05:26.135493 git.c:371               trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
> fatal: Out of memory, getdelim failed
> error: git://github.com/datalad/datalad did not send all necessary objects

That rev-list call comes from connected::check_connected(); the error
message from builtin/fetch.c::store_updated_refs(), which actually calls
check_connected().  So you should be able to reproduce the issue just
with git fetch or with git rev-list.  The latter requires passing the
right objects to the command, but perhaps reproduction is possible with
guessed or arbitrary values.

I don't know which files these commands access with getdelim() except
for the ones mentioned below, though.

>> Also: What does "wc -L .git/FETCH_HEAD .git/packed-refs" report?
> 
> "varying" and not consistent with causing an error (first trials, where I
> did not cat .git/FETCH_HEAD kinda suggested differently):
> 
> yhalchen@discovery:/mnt/scratch/yoh/datalad$ cat .git/FETCH_HEAD; wc -L .git/FETCH_HEAD .git/packed-refs; git pull --ff-only origin master
> 1f90ef474ee200befea19ba77242fa44f16739f0                branch 'master' of git://github.com/datalad/datalad
>   107 .git/FETCH_HEAD
>    90 .git/packed-refs
>   107 total

These line lengths are unlikely to exhaust the memory.  I was rather
hoping for values in the range of billions due to some kind of freak
accident or unconventional refs usage.

> NB is there a diff which could be given regexes within a line to ignore
> in diffs in so we could still retain original lines, i.e. answer to
> https://stackoverflow.com/questions/15841223/diff-while-ignoring-patterns-within-a-line-but-not-the-entire-line
> ?
> 
> FWIW -- here is a diff (from good to bad run) of strace (with pid/address
> info changed to stay the same for comparisons):
> 
> @@ -3121,6 +3118,7 @@
>   [pid YYYYY] close(3)                    = 0
>   [pid YYYYY] brk(NULL)                   = 0xXXXXX
>   [pid YYYYY] brk(0xXXXXX)         = 0xXXXXX
> +[pid YYYYY] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xXXXXX

That looks like malloc() increasing its pool by 1 MB...

>   [pid YYYYY] open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
>   [pid YYYYY] open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
>   [pid YYYYY] open("/usr/lib/locale/en_US/LC_MESSAGES", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
> @@ -3144,11 +3142,9 @@
>   [pid YYYYY] lstat(".git/commondir", 0xXXXXX) = -1 ENOENT (No such file or directory)
>   [pid YYYYY] open(".git/config", O_RDONLY) = 3
>   [pid YYYYY] fstat(3, {st_mode=S_IFREG|0644, st_size=257, ...}) = 0
> -[pid YYYYY] mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xXXXXX
>   [pid YYYYY] read(3, "[core]\n\trepositoryformatversion "..., 524288) = 257
>   [pid YYYYY] read(3, "", 524288)         = 0
>   [pid YYYYY] close(3)                    = 0
> -[pid YYYYY] munmap(0xXXXXX, 528384) = 0

... and in the working case it just gets 516 KB and releases it
shortly afterwards...

>   [pid YYYYY] stat(".", {st_mode=S_IFDIR|0755, st_size=907, ...}) = 0
>   [pid YYYYY] getcwd("/mnt/scratch/yoh/datalad", 129) = 25
>   [pid YYYYY] chdir(".")                  = 0
> @@ -3163,7 +3159,6 @@
>   [pid YYYYY] access(".git/config", R_OK) = 0
>   [pid YYYYY] open(".git/config", O_RDONLY) = 3
>   [pid YYYYY] fstat(3, {st_mode=S_IFREG|0644, st_size=257, ...}) = 0
> -[pid YYYYY] brk(0xXXXXX)         = 0xXXXXX
>   [pid YYYYY] read(3, "[core]\n\trepositoryformatversion "..., 524288) = 257
>   [pid YYYYY] read(3, "", 524288)         = 0
>   [pid YYYYY] close(3)                    = 0
> @@ -3180,245 +3175,48 @@
>   [pid YYYYY] read(0, "1f90ef474ee200befea19ba77242fa44"..., 4096) = 82
>   [pid YYYYY] open(".git/refs/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
>   [pid YYYYY] fstat(3, {st_mode=S_IFDIR|0755, st_size=70, ...}) = 0
> -[pid YYYYY] brk(0xXXXXX)         = 0xXXXXX
> +[pid YYYYY] mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xXXXXX
>   [pid YYYYY] getdents(3, /* 5 entries */, 524288) = 136
>   [pid YYYYY] stat(".git/refs/heads", {st_mode=S_IFDIR|0755, st_size=24, ...}) = 0
>   [pid YYYYY] stat(".git/refs/remotes", {st_mode=S_IFDIR|0755, st_size=24, ...}) = 0
>   [pid YYYYY] stat(".git/refs/tags", {st_mode=S_IFDIR|0755, st_size=47, ...}) = 0
>   [pid YYYYY] getdents(3, /* 0 entries */, 524288) = 0
> +[pid YYYYY] munmap(0xXXXXX, 528384) = 0

... which is done in the non-working case as well, just a bit
later...

>   [pid YYYYY] close(3)                    = 0
>   [pid YYYYY] open(".git/refs/bisect", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
>   [pid YYYYY] open(".git/packed-refs", O_RDONLY) = 3
>   [pid YYYYY] fstat(3, {st_mode=S_IFREG|0644, st_size=5042, ...}) = 0
>   [pid YYYYY] fstat(3, {st_mode=S_IFREG|0644, st_size=5042, ...}) = 0
> -[pid YYYYY] read(3, "# pack-refs with: peeled fully-p"..., 524288) = 5042

... and it doesn't seem to reach the stage when the packed refs are read.

So the bad case has malloc() grab and never release one more megabyte.
Is there are a limit for Singularity containers?  Can you increase the
one for yours by 1 MB? :)

René



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux