glusterfs process eats memory until OOM kills it

vbellur at redhat.com (Vijay Bellur) · Thu, 19 Jul 2012 12:27:39 +0530

On 07/19/2012 01:46 AM, Andreas Kurz wrote:
> Hi,
>
> I'm running GlusterFS 3.2.6 in AWS on CentOS 6.2, running a
> distributed/replicated setup.
>
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 4 x 2 = 8
>
> Whenever geo-replication is activated, the corresponding glusterfs
> process on the server starts eating memory (and using lot of cpu) until
> oom killer strikes back.
>
> This happens once user starting to change files via glusterfs mount and
> gsyncd starts crawling through the directory tree looking for changes.
> Network traffic between the servers is quite high, typically 10Mbit/s
> ... the vast majority are lookups and getxattr request from the server
> running geo-replication.
>
> I also created a state dump (5MB bzip2 archive) of this glusterfs
> process when eating about 9GB if that is needed for debugging I can
> upload it somewhere (Bugzilla?). Dropping dentries and inodes reclaims
> about 1GB.
>
> Any ideas? A bug? Any recommended tunings, maybe a gsyncd option? I
> changed these values:
>
> performance.stat-prefetch: off
> performance.quick-read: off
> performance.cache-refresh-timeout: 1
> performance.read-ahead: off
> geo-replication.indexing: on
> nfs.disable: on
> network.ping-timeout: 10
> performance.cache-size: 1073741824
>

Can you also try with performance.io-cache being set to off? If that 
doesn't show any improvement, please raise a bug and attach the 
statedump to it.

Thanks,
Vijay