Frequently network failure cause a FD leak after setting a short network.tcp-timeout

Jaden Liang <jaden1q84@xxxxxxxxx> · Thu, 11 Sep 2014 18:42:09 +0800

Hi all,

First of all, I have sent a mail about FD leak in network failure before.
 - http://supercolony.gluster.org/pipermail/gluster-devel/2014-August/041969.html
 'About file descriptor leak in glusterfsd daemon after network failure'

Thank Niels de Vos for telling me there is a bug 1129787 report and patch to solve.
 - https://bugzilla.redhat.com/show_bug.cgi?id=1129787
And also I have applied the patch http://review.gluster.org/8065 and set the 
network.tcp-timeout as 30s to verify if the issue is fixed. 

The patch and network.tcp-timeout setting worked fine mostly. However, I found 
out that in a very subtle condition, the FD still leaked. Here is the details:

In a replicated gluster volume with 2 nodes server. On the client-side, I use 
mount.glusterfs to access that gluster volume. I wrote a test program which 
opened a file in the volume and flock it(only flock once), then read and write file 
frequently. On one of nodes, I simulated a network failure for 15s then reconnect.
Note that the 15s is less than the network.tcp-timeout 30s, so the tcp connection is 
kept alive. By this way, the FD of brick file will be leaked sometime. After the test 
program exit, the FD in server-side wouldn't be closed. If restart the test program, 
it will failed while flock, return a Resource Temporarily Unavailable error.

Network failure yimeline:
---(15s connected)---|---(15s disconnected)---|---(15s connected)---|...repeat...

To trace this issue, I added some log codes in xlators/storage/posix. Found that on 
the network failure server, the glusterfsd will open and close file repeatedly in 1 
second when the network is connected(Is it a stat operation?). The leak FD happened 
in the disconnection switch, the last opened fd isn't closed.

So would this kind of network frequent failure cause the message of closing fd lost?

And according to this issue, I find one patch.
 - http://review.gluster.org/#/c/7470
But I am not sure if this patch is the same with my situation.

My log:
(Connected logs)
[2014-09-09 12:05:49.055772] I [posix.c:1846:posix_open] 0-vs_vol_rep2-posix: real_path: /mnt/h1/testfile, _fd=18
[2014-09-09 12:05:49.058009] I [posix.c:2286:posix_release] 0-vs_vol_rep2-posix: posix_release: fd=18
[2014-09-09 12:05:49.058499] I [posix.c:1846:posix_open] 0-vs_vol_rep2-posix: real_path: /mnt/h1/testfile, _fd=18
[2014-09-09 12:05:49.060655] I [posix.c:2286:posix_release] 0-vs_vol_rep2-posix: posix_release: fd=18
[2014-09-09 12:05:49.061587] I [posix.c:1846:posix_open] 0-vs_vol_rep2-posix: real_path: /mnt/h1/testfile, _fd=18
...
// Disconnect for 15s, we can see that the fd=18 isn't closed.
...
[2014-09-09 12:06:15.910357] I [posix-helpers.c:1027:__posix_fd_ctx_get] 0-vs_vol_rep2-posix: open, _fd=16, real_path=/mnt/h1/.glusterfs/44/3d/443d50b8-5187-48d0-92ba-779c62cb9fe0
[2014-09-09 12:06:15.910637] I [posix.c:2286:posix_release] 0-vs_vol_rep2-posix: posix_release: fd=16
[2014-09-09 12:06:15.919489] I [posix-helpers.c:1027:__posix_fd_ctx_get] 0-vs_vol_rep2-posix: open, _fd=16, real_path=/mnt/h1/.glusterfs/44/3d/443d50b8-5187-48d0-92ba-779c62cb9fe0
[2014-09-09 12:06:15.919814] I [posix.c:2286:posix_release] 0-vs_vol_rep2-posix: posix_release: fd=16
[2014-09-09 12:06:16.003219] I [posix-helpers.c:1027:__posix_fd_ctx_get] 0-vs_vol_rep2-posix: open, _fd=16, real_path=/mnt/h1/.glusterfs/44/3d/443d50b8-5187-48d0-92ba-779c62cb9fe0
[2014-09-09 12:06:16.003395] I [posix.c:2286:posix_release] 0-vs_vol_rep2-posix: posix_release: fd=16
[2014-09-09 12:06:45.015708] I [posix.c:1846:posix_open] 0-vs_vol_rep2-posix: real_path: /mnt/h1/testfile, _fd=23

// On the server side, lsof show
#lsof /mnt/h1/testfile
COMMAND     PID USER   FD   TYPE DEVICE    SIZE/OFF      NODE NAME
glusterfs 10583 root   18u   REG   8,17 21478375424 335543947 /mnt/h1/testfile

// On the server-side, gluster volume status fd show
# gluster volume status vol_rep2 h1:/mnt/h1 fd
FD tables for volume vs_vol_rep2
----------------------------------------------
Brick : vol_rep2 h1:/mnt/h1
Connection 1:
RefCount = 0  MaxFDs = 128  FirstFree = 2
FD Entry            PID                 RefCount            Flags              
--------            ---                 --------            -----              
0                   9153                1                   16386     

Connection 2:
RefCount = 0  MaxFDs = 128  FirstFree = 0
No open fds

// Note that 9153 is my test program pid which is EXITED already.

// statedump
[xlator.features.locks.vs_vol_rep2-locks.inode]
path=/testfile
mandatory=0
posixlk-count=1
posixlk.posixlk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 9153, owner=dac8e118f3c2e02e, transport=0x7faba00090d0, , granted at Tue Sep  9 20:05:49 2014

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel