Re: OOM issue in openstack Cinder - GlusterFS CI env

Niels de Vos <ndevos@xxxxxxxxxx> · Sun, 22 Feb 2015 16:16:54 +0100

Hi Deepak and others,

On Sat, Feb 21, 2015 at 10:00:18PM +0530, Deepak Shetty wrote:
> Hi All,
>   I am looking for some help from glusterfs side for the Out of Memory
> (OOM) issue
> we are seeing when using GlusterFS as a storage backend for openstack
> Cinder (block storage service)

This seems to be using the FUSE mount points for the VM images, right?

>     openstack has a upstream CI env managed by openstack infra team, where
> we added a new job that creates a devstack env (openstack all in one for
> newbies) and configures the block service (Cinder) with GlusterFS as
> storage backend. Once setup, the CI job runs openstack tempest (Integration
> test suite of openstack) that does API level testing of the whole openstack
> env.
> 
>     As part of that testing, ~1.5 to 2 hours into the run, the tempest job
> (VM) hits OOM and the kernel oom-killer kills the process with the max
> memory to reduce memory pressure.
> 
>     The tempest job is based on CentOS 7 and uses glusterfs 3.6.2 as the
> storage backend for openstack Cinder
> 
>     The openstack-dev thread @
> *http://thread.gmane.org/gmane.comp.cloud.openstack.devel/46861
> <http://thread.gmane.org/gmane.comp.cloud.openstack.devel/46861>* has
> details including links to the logs captured from the tempest jobs.

http://fungi.yuggoth.org/tmp/logs2.tar seems to be the most useful.

> Per the openstack infra folks, they have other jobs based off CentOS7 that
> doesn't hit this issue, the only change we are adding is configuring cinder
> with glusterfs when this happens, so right now glusterfs is in the
> spotlight for causing this.

Well, there definitely is something wrong on the Gluster side. After the
'blackout' between 17:47 - 18:17, this is logged in
logs/screen-c-vol.txt.gz:

  mount -t glusterfs 127.0.0.1:/vol1 /opt/stack/data/cinder/mnt/f4c6ad7e3bba4ad1195b3b538efab64a" returned: 32

32 as exit code from "mount" means "mount failure".

The mountpoint would log details in
logs/glusterfs/opt-stack-data-cinder-mnt-f4c6ad7e3bba4ad1195b3b538efab64a.log.
At 18:00 in that file, there is this message:

  0-vol1-client-0: server 104.130.30.179:49152 has not responded in the last 42 seconds, disconnecting.

Its a one-brick volume, so this error is pretty fatal. This error does
not cause an unmount, but the mountpoint will return errors. Subsequent
mount tries will fail, because the mountpoint is not unmounted yet.
Tests to see if the target directory can be used to mount a volume, will
fail, causing the mount process to return error 32.

A little later, the (still) existing fuse mount client manages to
connect to the brick again:

  0-vol1-client-0: changing port to 49152 (from 0)

The log of the brick does not contain anything useful for this time :-/

> I am looking thru the logs trying to co-relate syslog, dstat, tempest info
> to figure the state of the system and what was happening at and before the
> OOM to get any clues, but wanted to start this thread in gluster-devel to
> see if others can pitch in with their ideas to accelerate the debug and
> help root cause.

I have not yet found any pointer to what could have taken place between
17:47-18:17. It seems that network I/O stopped (ping timeout in FUSE).

One of the few logs that I came across which contains some messages
during this 'blackout', is logs/screen-n-cond.txt.gz. But I assume these
messages are triggered by the underlying network issues.

You mention that there was a OOM? Which logs contain some details about
that? Also, could you explain what test (workload) was running in the
VM(s?) on the FUSE mountpoint?

In logs/screen-dstat.txt.gz there seems to be a process called
"cinder-backup" that is running around that time. From the details of
dstat, I guess that this process causes the reading of files from the
Gluster Volume. When the system becomes responsive again, dstat has 11G
as "dsk/total read", it jumped from ~400M in 30 minutes. If this amount
of data is not flushed but kept in memory, I don't doubt that an OOM
could be triggered.

Do you know where this backup is written to? If we can setup a
reproducer environment that emulates the workload, things would become
much easier for debugging :)

One of the 1st things I would try, is to configure the kernel to flush
dirty memory more aggressively ('sysctl' values for 'vm.*' options). An
other thing that may help to identify the memory usage, is capturing the
output of 'slabtop --once' in a 1-minute interval.

Possibly you could run the same test on a higher spec'd (more RAM?)
machine, and capture more details. Even if there is no OOM, the memory
usage pattern should look similar and might aid with the investigations.

HTH,
Niels

> Also pasting relevant part of the chat log I had with infra folks ...
> 
> Feb 20 21:46:28 <sdague>        deepakcs: you are at 70% wait time at the
> end of that
> 
> Feb 20 21:46:37 <sdague>        so your io system is just gone bonkers
> 
> Feb 20 21:47:14 <fungi> sdague: that would explain why the console login
> prompt and ssh daemon both stopped working, and the df loop in had going in
> my second ssh session hung around the same time
> Feb 20 21:47:26 <sdague>        yeh, dstat even says it's skipping ticks
> there
> Feb 20 21:47:29 <sdague>        for that reason
> 
> Feb 20 21:47:48 <fungi> likely complete i/o starvation for an extended
> period at around that timeframe
> Feb 20 21:48:05 <fungi> that would also definitely cause jenkins to give up
> on the worker if it persisted for very long at all
> 
> Feb 20 21:48:09 <sdague>        yeh, cached memory is down to double digit M
> 
> Feb 20 21:49:21 <sdague>        deepakcs: so, honestly, what it means to me
> is that glusterfs is may be too inefficient to function in this environment
> Feb 20 21:49:34 <sdague>        because it's kind of a constrained
> environment

> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel

Attachment:
pgpyE2GNfzRUz.pgp

Description: PGP signature
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel