Hi All,
I am looking for some help from glusterfs side for the Out of Memory (OOM) issueAs part of that testing, ~1.5 to 2 hours into the run, the tempest job (VM) hits OOM and the kernel oom-killer kills the process with the max memory to reduce memory pressure.
I am looking thru the logs trying to co-relate syslog, dstat, tempest info to figure the state of the system and what was happening at and before the OOM to get any clues, but wanted to start this thread in gluster-devel to see if others can pitch in with their ideas to accelerate the debug and help root cause.
Feb 20 21:46:28 <sdague> deepakcs: you are at 70% wait time at the end of that
Feb 20 21:46:37 <sdague> so your io system is just gone bonkers
Feb 20 21:47:14 <fungi> sdague: that would explain why the console login prompt and ssh daemon both stopped working, and the df loop in had going in my second ssh session hung around the same time
Feb 20 21:47:26 <sdague> yeh, dstat even says it's skipping ticks there
Feb 20 21:47:29 <sdague> for that reason
Feb 20 21:47:48 <fungi> likely complete i/o starvation for an extended period at around that timeframe
Feb 20 21:48:05 <fungi> that would also definitely cause jenkins to give up on the worker if it persisted for very long at all
Feb 20 21:48:09 <sdague> yeh, cached memory is down to double digit M
Feb 20 21:49:21 <sdague> deepakcs: so, honestly, what it means to me is that glusterfs is may be too inefficient to function in this environment
Feb 20 21:49:34 <sdague> because it's kind of a constrained environment
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel