Re: [Gluster-users] Memory leak in GlusterFS FUSE client

Soumya Koduri <skoduri@xxxxxxxxxx> · Mon, 28 Dec 2015 04:02:52 -0500 (EST)

----- Original Message -----
> From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> To: "Oleksandr Natalenko" <oleksandr@xxxxxxxxxxxxxx>, "Soumya Koduri" <skoduri@xxxxxxxxxx>
> Cc: gluster-users@xxxxxxxxxxx, gluster-devel@xxxxxxxxxxx
> Sent: Monday, December 28, 2015 9:32:07 AM
> Subject: Re:  [Gluster-users] Memory leak in GlusterFS FUSE client
> 
> 
> 
> On 12/26/2015 04:45 AM, Oleksandr Natalenko wrote:
> > Also, here is valgrind output with our custom tool, that does GlusterFS
> > volume
> > traversing (with simple stats) just like find tool. In this case
> > NFS-Ganesha
> > is not used.
> >
> > https://gist.github.com/e4602a50d3c98f7a2766
> hi Oleksandr,
>        I went through the code. Both NFS Ganesha and the custom tool use
> gfapi and the leak is stemming from that. I am not very familiar with
> this part of code but there seems to be one inode_unref() that is
> missing in failure path of resolution. Not sure if that is corresponding
> to the leaks.
> 
> Soumya,
>         Could this be the issue? review.gluster.org seems to be down. So
> couldn't send the patch. Please ping me on IRC.
> diff --git a/api/src/glfs-resolve.c b/api/src/glfs-resolve.c
> index b5efcba..52b538b 100644
> --- a/api/src/glfs-resolve.c
> +++ b/api/src/glfs-resolve.c
> @@ -467,9 +467,11 @@ priv_glfs_resolve_at (struct glfs *fs, xlator_t
> *subvol, inode_t *at,
>                  }
>          }
> 
> -       if (parent && next_component)
> +       if (parent && next_component) {
> +               inode_unref (parent);
> +               parent = NULL;
>                  /* resolution failed mid-way */
>                  goto out;
> +        }
> 
>          /* At this point, all components up to the last parent directory
>             have been resolved successfully (@parent). Resolution of
> basename
> 
yes. This could be one of the reasons. There are few leaks with respect to inode references in gfAPI. See below.

On GlusterFS side, looks like majority of the leaks are related to inodes and their contexts. Possible reasons which I can think of are:

1) When there is a graph switch, old inode table and their entries are not purged (this is a known issue). There was an effort put to fix this issue. But I think it had other side-effects and hence not been applied. Maybe we should revive those changes again.

2) With regard to above, old entries can be purged in case if any request comes with the reference to old inode (as part of 'glfs_resolve_inode'), provided their reference counts are properly decremented. But this is not happening at the moment in gfapi.

3) Applications should hold and release their reference as needed and required. There are certain fixes needed in this area as well (including the fix provided by Pranith above).

>From code-inspection, have made changes to fix few leaks of case (2) & (3) with respect to gfAPI.
	http://review.gluster.org/#/c/13096 (yet to test the changes)

I haven't yet narrowed down any suspects pertaining to only NFS-Ganesha. Will re-check and update.

Thanks,
Soumya

> Pranith
> >
> > One may see GlusterFS-related leaks here as well.
> >
> > On пʼятниця, 25 грудня 2015 р. 20:28:13 EET Soumya Koduri wrote:
> >> On 12/24/2015 09:17 PM, Oleksandr Natalenko wrote:
> >>> Another addition: it seems to be GlusterFS API library memory leak
> >>> because NFS-Ganesha also consumes huge amount of memory while doing
> >>> ordinary "find . -type f" via NFSv4.2 on remote client. Here is memory
> >>> usage:
> >>>
> >>> ===
> >>> root      5416 34.2 78.5 2047176 1480552 ?     Ssl  12:02 117:54
> >>> /usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f
> >>> /etc/ganesha/ganesha.conf -N NIV_EVENT
> >>> ===
> >>>
> >>> 1.4G is too much for simple stat() :(.
> >>>
> >>> Ideas?
> >> nfs-ganesha also has cache layer which can scale to millions of entries
> >> depending on the number of files/directories being looked upon. However
> >> there are parameters to tune it. So either try stat with few entries or
> >> add below block in nfs-ganesha.conf file, set low limits and check the
> >> difference. That may help us narrow down how much memory actually
> >> consumed by core nfs-ganesha and gfAPI.
> >>
> >> CACHEINODE {
> >> 	Cache_Size(uint32, range 1 to UINT32_MAX, default 32633); # cache size
> >> 	Entries_HWMark(uint32, range 1 to UINT32_MAX, default 100000); #Max no.
> >> of entries in the cache.
> >> }
> >>
> >> Thanks,
> >> Soumya
> >>
> >>> 24.12.2015 16:32, Oleksandr Natalenko написав:
> >>>> Still actual issue for 3.7.6. Any suggestions?
> >>>>
> >>>> 24.09.2015 10:14, Oleksandr Natalenko написав:
> >>>>> In our GlusterFS deployment we've encountered something like memory
> >>>>> leak in GlusterFS FUSE client.
> >>>>>
> >>>>> We use replicated (×2) GlusterFS volume to store mail (exim+dovecot,
> >>>>> maildir format). Here is inode stats for both bricks and mountpoint:
> >>>>>
> >>>>> ===
> >>>>> Brick 1 (Server 1):
> >>>>>
> >>>>> Filesystem                                             Inodes    IUsed
> >>>>>
> >>>>>       IFree IUse% Mounted on
> >>>>>
> >>>>> /dev/mapper/vg_vd1_misc-lv08_mail                   578768144 10954918
> >>>>>
> >>>>>   567813226    2% /bricks/r6sdLV08_vd1_mail
> >>>>>
> >>>>> Brick 2 (Server 2):
> >>>>>
> >>>>> Filesystem                                             Inodes    IUsed
> >>>>>
> >>>>>       IFree IUse% Mounted on
> >>>>>
> >>>>> /dev/mapper/vg_vd0_misc-lv07_mail                   578767984 10954913
> >>>>>
> >>>>>   567813071    2% /bricks/r6sdLV07_vd0_mail
> >>>>>
> >>>>> Mountpoint (Server 3):
> >>>>>
> >>>>> Filesystem                              Inodes    IUsed      IFree
> >>>>> IUse% Mounted on
> >>>>> glusterfs.xxx:mail                   578767760 10954915  567812845
> >>>>> 2% /var/spool/mail/virtual
> >>>>> ===
> >>>>>
> >>>>> glusterfs.xxx domain has two A records for both Server 1 and Server 2.
> >>>>>
> >>>>> Here is volume info:
> >>>>>
> >>>>> ===
> >>>>> Volume Name: mail
> >>>>> Type: Replicate
> >>>>> Volume ID: f564e85c-7aa6-4170-9417-1f501aa98cd2
> >>>>> Status: Started
> >>>>> Number of Bricks: 1 x 2 = 2
> >>>>> Transport-type: tcp
> >>>>> Bricks:
> >>>>> Brick1: server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>>>> Brick2: server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>>>> Options Reconfigured:
> >>>>> nfs.rpc-auth-allow: 1.2.4.0/24,4.5.6.0/24
> >>>>> features.cache-invalidation-timeout: 10
> >>>>> performance.stat-prefetch: off
> >>>>> performance.quick-read: on
> >>>>> performance.read-ahead: off
> >>>>> performance.flush-behind: on
> >>>>> performance.write-behind: on
> >>>>> performance.io-thread-count: 4
> >>>>> performance.cache-max-file-size: 1048576
> >>>>> performance.cache-size: 67108864
> >>>>> performance.readdir-ahead: off
> >>>>> ===
> >>>>>
> >>>>> Soon enough after mounting and exim/dovecot start, glusterfs client
> >>>>> process begins to consume huge amount of RAM:
> >>>>>
> >>>>> ===
> >>>>> user@server3 ~$ ps aux | grep glusterfs | grep mail
> >>>>> root     28895 14.4 15.0 15510324 14908868 ?   Ssl  Sep03 4310:05
> >>>>> /usr/sbin/glusterfs --fopen-keep-cache --direct-io-mode=disable
> >>>>> --volfile-server=glusterfs.xxx --volfile-id=mail
> >>>>> /var/spool/mail/virtual
> >>>>> ===
> >>>>>
> >>>>> That is, ~15 GiB of RAM.
> >>>>>
> >>>>> Also we've tried to use mountpoint withing separate KVM VM with 2 or 3
> >>>>> GiB of RAM, and soon after starting mail daemons got OOM killer for
> >>>>> glusterfs client process.
> >>>>>
> >>>>> Mounting same share via NFS works just fine. Also, we have much less
> >>>>> iowait and loadavg on client side with NFS.
> >>>>>
> >>>>> Also, we've tried to change IO threads count and cache size in order
> >>>>> to limit memory usage with no luck. As you can see, total cache size
> >>>>> is 4×64==256 MiB (compare to 15 GiB).
> >>>>>
> >>>>> Enabling-disabling stat-prefetch, read-ahead and readdir-ahead didn't
> >>>>> help as well.
> >>>>>
> >>>>> Here are volume memory stats:
> >>>>>
> >>>>> ===
> >>>>> Memory status for volume : mail
> >>>>> ----------------------------------------------
> >>>>> Brick : server1.xxx:/bricks/r6sdLV08_vd1_mail/mail
> >>>>> Mallinfo
> >>>>> --------
> >>>>> Arena    : 36859904
> >>>>> Ordblks  : 10357
> >>>>> Smblks   : 519
> >>>>> Hblks    : 21
> >>>>> Hblkhd   : 30515200
> >>>>> Usmblks  : 0
> >>>>> Fsmblks  : 53440
> >>>>> Uordblks : 18604144
> >>>>> Fordblks : 18255760
> >>>>> Keepcost : 114112
> >>>>>
> >>>>> Mempool Stats
> >>>>> -------------
> >>>>> Name                            HotCount ColdCount PaddedSizeof
> >>>>> AllocCount MaxAlloc   Misses Max-StdAlloc
> >>>>> ----                            -------- --------- ------------
> >>>>> ---------- -------- -------- ------------
> >>>>> mail-server:fd_t                       0      1024          108
> >>>>> 30773120      137        0            0
> >>>>> mail-server:dentry_t               16110       274           84
> >>>>> 235676148    16384  1106499         1152
> >>>>> mail-server:inode_t                16363        21          156
> >>>>> 237216876    16384  1876651         1169
> >>>>> mail-trash:fd_t                        0      1024          108
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-trash:dentry_t                    0     32768           84
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-trash:inode_t                     4     32764          156
> >>>>>
> >>>>>    4        4        0            0
> >>>>>
> >>>>> mail-trash:trash_local_t               0        64         8628
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-changetimerecorder:gf_ctr_local_t         0        64
> >>>>> 16540          0        0        0            0
> >>>>> mail-changelog:rpcsvc_request_t         0         8         2828
> >>>>>
> >>>>>     0        0        0            0
> >>>>>
> >>>>> mail-changelog:changelog_local_t         0        64          116
> >>>>>
> >>>>>      0        0        0            0
> >>>>>
> >>>>> mail-bitrot-stub:br_stub_local_t         0       512           84
> >>>>> 79204        4        0            0
> >>>>> mail-locks:pl_local_t                  0        32          148
> >>>>> 6812757        4        0            0
> >>>>> mail-upcall:upcall_local_t             0       512          108
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-marker:marker_local_t             0       128          332
> >>>>> 64980        3        0            0
> >>>>> mail-quota:quota_local_t               0        64          476
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-server:rpcsvc_request_t           0       512         2828
> >>>>> 45462533       34        0            0
> >>>>> glusterfs:struct saved_frame           0         8          124
> >>>>>
> >>>>>    2        2        0            0
> >>>>>
> >>>>> glusterfs:struct rpc_req               0         8          588
> >>>>>
> >>>>>    2        2        0            0
> >>>>>
> >>>>> glusterfs:rpcsvc_request_t             1         7         2828
> >>>>>
> >>>>>    2        1        0            0
> >>>>>
> >>>>> glusterfs:log_buf_t                    5       251          140
> >>>>> 3452        6        0            0
> >>>>> glusterfs:data_t                     242     16141           52
> >>>>> 480115498      664        0            0
> >>>>> glusterfs:data_pair_t                230     16153           68
> >>>>> 179483528      275        0            0
> >>>>> glusterfs:dict_t                      23      4073          140
> >>>>> 303751675      627        0            0
> >>>>> glusterfs:call_stub_t                  0      1024         3764
> >>>>> 45290655       34        0            0
> >>>>> glusterfs:call_stack_t                 1      1023         1708
> >>>>> 43598469       34        0            0
> >>>>> glusterfs:call_frame_t                 1      4095          172
> >>>>> 336219655      184        0            0
> >>>>> ----------------------------------------------
> >>>>> Brick : server2.xxx:/bricks/r6sdLV07_vd0_mail/mail
> >>>>> Mallinfo
> >>>>> --------
> >>>>> Arena    : 38174720
> >>>>> Ordblks  : 9041
> >>>>> Smblks   : 507
> >>>>> Hblks    : 21
> >>>>> Hblkhd   : 30515200
> >>>>> Usmblks  : 0
> >>>>> Fsmblks  : 51712
> >>>>> Uordblks : 19415008
> >>>>> Fordblks : 18759712
> >>>>> Keepcost : 114848
> >>>>>
> >>>>> Mempool Stats
> >>>>> -------------
> >>>>> Name                            HotCount ColdCount PaddedSizeof
> >>>>> AllocCount MaxAlloc   Misses Max-StdAlloc
> >>>>> ----                            -------- --------- ------------
> >>>>> ---------- -------- -------- ------------
> >>>>> mail-server:fd_t                       0      1024          108
> >>>>> 2373075      133        0            0
> >>>>> mail-server:dentry_t               14114      2270           84
> >>>>> 3513654    16384     2300          267
> >>>>> mail-server:inode_t                16374        10          156
> >>>>> 6766642    16384   194635         1279
> >>>>> mail-trash:fd_t                        0      1024          108
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-trash:dentry_t                    0     32768           84
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-trash:inode_t                     4     32764          156
> >>>>>
> >>>>>    4        4        0            0
> >>>>>
> >>>>> mail-trash:trash_local_t               0        64         8628
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-changetimerecorder:gf_ctr_local_t         0        64
> >>>>> 16540          0        0        0            0
> >>>>> mail-changelog:rpcsvc_request_t         0         8         2828
> >>>>>
> >>>>>     0        0        0            0
> >>>>>
> >>>>> mail-changelog:changelog_local_t         0        64          116
> >>>>>
> >>>>>      0        0        0            0
> >>>>>
> >>>>> mail-bitrot-stub:br_stub_local_t         0       512           84
> >>>>> 71354        4        0            0
> >>>>> mail-locks:pl_local_t                  0        32          148
> >>>>> 8135032        4        0            0
> >>>>> mail-upcall:upcall_local_t             0       512          108
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-marker:marker_local_t             0       128          332
> >>>>> 65005        3        0            0
> >>>>> mail-quota:quota_local_t               0        64          476
> >>>>>
> >>>>>    0        0        0            0
> >>>>>
> >>>>> mail-server:rpcsvc_request_t           0       512         2828
> >>>>> 12882393       30        0            0
> >>>>> glusterfs:struct saved_frame           0         8          124
> >>>>>
> >>>>>    2        2        0            0
> >>>>>
> >>>>> glusterfs:struct rpc_req               0         8          588
> >>>>>
> >>>>>    2        2        0            0
> >>>>>
> >>>>> glusterfs:rpcsvc_request_t             1         7         2828
> >>>>>
> >>>>>    2        1        0            0
> >>>>>
> >>>>> glusterfs:log_buf_t                    5       251          140
> >>>>> 3443        6        0            0
> >>>>> glusterfs:data_t                     242     16141           52
> >>>>> 138743429      290        0            0
> >>>>> glusterfs:data_pair_t                230     16153           68
> >>>>> 126649864      270        0            0
> >>>>> glusterfs:dict_t                      23      4073          140
> >>>>> 20356289       63        0            0
> >>>>> glusterfs:call_stub_t                  0      1024         3764
> >>>>> 13678560       31        0            0
> >>>>> glusterfs:call_stack_t                 1      1023         1708
> >>>>> 11011561       30        0            0
> >>>>> glusterfs:call_frame_t                 1      4095          172
> >>>>> 125764190      193        0            0
> >>>>> ----------------------------------------------
> >>>>> ===
> >>>>>
> >>>>> So, my questions are:
> >>>>>
> >>>>> 1) what one should do to limit GlusterFS FUSE client memory usage?
> >>>>> 2) what one should do to prevent client high loadavg because of high
> >>>>> iowait because of multiple concurrent volume users?
> >>>>>
> >>>>> Server/client OS is CentOS 7.1, GlusterFS server version is 3.7.3,
> >>>>> GlusterFS client version is 3.7.4.
> >>>>>
> >>>>> Any additional info needed?
> >>> _______________________________________________
> >>> Gluster-users mailing list
> >>> Gluster-users@xxxxxxxxxxx
> >>> http://www.gluster.org/mailman/listinfo/gluster-users
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxxx
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel