Re: [Gluster-devel] GlusterFS FUSE client leaks summary — part I

Raghavendra G <raghavendra@xxxxxxxxxxx> · Thu, 4 Feb 2016 14:16:45 +0530

On Mon, Feb 1, 2016 at 2:24 PM, Soumya Koduri <skoduri@xxxxxxxxxx> wrote:

On 02/01/2016 01:39 PM, Oleksandr Natalenko wrote:

Wait. It seems to be my bad.

Before unmounting I do drop_caches (2), and glusterfs process CPU usage

goes to 100% for a while. I haven't waited for it to drop to 0%, and

instead perform unmount. It seems glusterfs is purging inodes and that's

why it uses 100% of CPU. I've re-tested it, waiting for CPU usage to

become normal, and got no leaks.

Will verify this once again and report more.

BTW, if that works, how could I limit inode cache for FUSE client? I do

not want it to go beyond 1G, for example, even if I have 48G of RAM on

my server.

Its hard-coded for now. For fuse the lru limit (of the inodes which are not active) is (32*1024).

One of the ways to address this (which we were discussing earlier) is to have an option to configure inode cache limit.

We cannot set a limit on inode cache in fuse-bridge. As long as kernel is aware of an inode (through a lookup), fuse-client is _forced_ to keep that inode in inode table. The reason is we pass the address of inode object as nodeid to kernel. We cannot send a gfid as nodeid since gfid is 128 bit and nodeid is 64 bit. This is the reason behind setting an infinite lru limit. However, this problem is not there for inode table management in server as, client can communicate with server using 128 bit gfids.

 If that sounds good, we can then check on if it has to be global/volume-level, client/server/both.

Thanks,

Soumya

01.02.2016 09:54, Soumya Koduri написав:

On 01/31/2016 03:05 PM, Oleksandr Natalenko wrote:

Unfortunately, this patch doesn't help.

RAM usage on "find" finish is ~9G.

Here is statedump before drop_caches: https://gist.github.com/

fc1647de0982ab447e20

[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]

size=706766688

num_allocs=2454051

And after drop_caches: https://gist.github.com/5eab63bc13f78787ed19

[mount/fuse.fuse - usage-type gf_common_mt_inode_ctx memusage]

size=550996416

num_allocs=1913182

There isn't much significant drop in inode contexts. One of the

reasons could be because of dentrys holding a refcount on the inodes

which shall result in inodes not getting purged even after

fuse_forget.

pool-name=fuse:dentry_t

hot-count=32761

if  '32761' is the current active dentry count, it still doesn't seem

to match up to inode count.

Thanks,

Soumya

And here is Valgrind output:

https://gist.github.com/2490aeac448320d98596

On субота, 30 січня 2016 р. 22:56:37 EET Xavier Hernandez wrote:

There's another inode leak caused by an incorrect counting of

lookups on directory reads.

Here's a patch that solves the problem for

3.7:

http://review.gluster.org/13324

Hopefully with this patch the

memory leaks should disapear.

Xavi

On 29.01.2016 19:09, Oleksandr

Natalenko wrote:

Here is intermediate summary of current memory

leaks in FUSE client

investigation.

I use GlusterFS v3.7.6

release with the following patches:

===

Kaleb S KEITHLEY (1):

fuse: use-after-free fix in fuse-bridge, revisited

Pranith Kumar K

(1):

mount/fuse: Fix use-after-free crash

Soumya Koduri (3):

gfapi: Fix inode nlookup counts

inode: Retire the inodes from the lru

list in inode_table_destroy

upcall: free the xdr* allocations

===

With those patches we got API leaks fixed (I hope, brief tests show

that) and

got rid of "kernel notifier loop terminated" message.

Nevertheless, FUSE

client still leaks.

I have several test

volumes with several million of small files (100K…2M in

average). I

do 2 types of FUSE client testing:

1) find /mnt/volume -type d

2)

rsync -av -H /mnt/source_volume/* /mnt/target_volume/

And most

up-to-date results are shown below:

=== find /mnt/volume -type d

===

Memory consumption: ~4G

Statedump:

https://gist.github.com/10cde83c63f1b4f1dd7a

Valgrind:

https://gist.github.com/097afb01ebb2c5e9e78d

I guess,

fuse-bridge/fuse-resolve. related.

=== rsync -av -H

/mnt/source_volume/* /mnt/target_volume/ ===

Memory consumption:

~3.3...4G

Statedump (target volume):

https://gist.github.com/31e43110eaa4da663435

Valgrind (target volume):

https://gist.github.com/f8e0151a6878cacc9b1a

I guess,

DHT-related.

Give me more patches to test :).

_______________________________________________

Gluster-devel mailing

list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-devel

-- 
Raghavendra G

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users