Hi Deepak and others, On Sat, Feb 21, 2015 at 10:00:18PM +0530, Deepak Shetty wrote: > Hi All, > I am looking for some help from glusterfs side for the Out of Memory > (OOM) issue > we are seeing when using GlusterFS as a storage backend for openstack > Cinder (block storage service) This seems to be using the FUSE mount points for the VM images, right? > openstack has a upstream CI env managed by openstack infra team, where > we added a new job that creates a devstack env (openstack all in one for > newbies) and configures the block service (Cinder) with GlusterFS as > storage backend. Once setup, the CI job runs openstack tempest (Integration > test suite of openstack) that does API level testing of the whole openstack > env. > > As part of that testing, ~1.5 to 2 hours into the run, the tempest job > (VM) hits OOM and the kernel oom-killer kills the process with the max > memory to reduce memory pressure. > > The tempest job is based on CentOS 7 and uses glusterfs 3.6.2 as the > storage backend for openstack Cinder > > The openstack-dev thread @ > *http://thread.gmane.org/gmane.comp.cloud.openstack.devel/46861 > <http://thread.gmane.org/gmane.comp.cloud.openstack.devel/46861>* has > details including links to the logs captured from the tempest jobs. http://fungi.yuggoth.org/tmp/logs2.tar seems to be the most useful. > Per the openstack infra folks, they have other jobs based off CentOS7 that > doesn't hit this issue, the only change we are adding is configuring cinder > with glusterfs when this happens, so right now glusterfs is in the > spotlight for causing this. Well, there definitely is something wrong on the Gluster side. After the 'blackout' between 17:47 - 18:17, this is logged in logs/screen-c-vol.txt.gz: mount -t glusterfs 127.0.0.1:/vol1 /opt/stack/data/cinder/mnt/f4c6ad7e3bba4ad1195b3b538efab64a" returned: 32 32 as exit code from "mount" means "mount failure". The mountpoint would log details in logs/glusterfs/opt-stack-data-cinder-mnt-f4c6ad7e3bba4ad1195b3b538efab64a.log. At 18:00 in that file, there is this message: 0-vol1-client-0: server 104.130.30.179:49152 has not responded in the last 42 seconds, disconnecting. Its a one-brick volume, so this error is pretty fatal. This error does not cause an unmount, but the mountpoint will return errors. Subsequent mount tries will fail, because the mountpoint is not unmounted yet. Tests to see if the target directory can be used to mount a volume, will fail, causing the mount process to return error 32. A little later, the (still) existing fuse mount client manages to connect to the brick again: 0-vol1-client-0: changing port to 49152 (from 0) The log of the brick does not contain anything useful for this time :-/ > I am looking thru the logs trying to co-relate syslog, dstat, tempest info > to figure the state of the system and what was happening at and before the > OOM to get any clues, but wanted to start this thread in gluster-devel to > see if others can pitch in with their ideas to accelerate the debug and > help root cause. I have not yet found any pointer to what could have taken place between 17:47-18:17. It seems that network I/O stopped (ping timeout in FUSE). One of the few logs that I came across which contains some messages during this 'blackout', is logs/screen-n-cond.txt.gz. But I assume these messages are triggered by the underlying network issues. You mention that there was a OOM? Which logs contain some details about that? Also, could you explain what test (workload) was running in the VM(s?) on the FUSE mountpoint? In logs/screen-dstat.txt.gz there seems to be a process called "cinder-backup" that is running around that time. From the details of dstat, I guess that this process causes the reading of files from the Gluster Volume. When the system becomes responsive again, dstat has 11G as "dsk/total read", it jumped from ~400M in 30 minutes. If this amount of data is not flushed but kept in memory, I don't doubt that an OOM could be triggered. Do you know where this backup is written to? If we can setup a reproducer environment that emulates the workload, things would become much easier for debugging :) One of the 1st things I would try, is to configure the kernel to flush dirty memory more aggressively ('sysctl' values for 'vm.*' options). An other thing that may help to identify the memory usage, is capturing the output of 'slabtop --once' in a 1-minute interval. Possibly you could run the same test on a higher spec'd (more RAM?) machine, and capture more details. Even if there is no OOM, the memory usage pattern should look similar and might aid with the investigations. HTH, Niels > Also pasting relevant part of the chat log I had with infra folks ... > > Feb 20 21:46:28 <sdague> deepakcs: you are at 70% wait time at the > end of that > > Feb 20 21:46:37 <sdague> so your io system is just gone bonkers > > Feb 20 21:47:14 <fungi> sdague: that would explain why the console login > prompt and ssh daemon both stopped working, and the df loop in had going in > my second ssh session hung around the same time > Feb 20 21:47:26 <sdague> yeh, dstat even says it's skipping ticks > there > Feb 20 21:47:29 <sdague> for that reason > > Feb 20 21:47:48 <fungi> likely complete i/o starvation for an extended > period at around that timeframe > Feb 20 21:48:05 <fungi> that would also definitely cause jenkins to give up > on the worker if it persisted for very long at all > > Feb 20 21:48:09 <sdague> yeh, cached memory is down to double digit M > > Feb 20 21:49:21 <sdague> deepakcs: so, honestly, what it means to me > is that glusterfs is may be too inefficient to function in this environment > Feb 20 21:49:34 <sdague> because it's kind of a constrained > environment > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel
Attachment:
pgpyE2GNfzRUz.pgp
Description: PGP signature
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel