NFS over gluster stops responding under write load

harry.mangalam at uci.edu (harry mangalam) · Thu, 20 Sep 2012 12:25:10 -0700

Not expected behavior, but one that has been noted before:

Joe Julian posted this only a few days ago - it sounds similar to your 
problem?

https://bugzilla.redhat.com/show_bug.cgi?id=849526

On Thursday, September 20, 2012 11:58:50 AM Lonni J Friedman wrote:
> No replies?  Surely this can't be expected behavior?
> 
> On Tue, Sep 18, 2012 at 2:59 PM, Lonni J Friedman <netllama at gmail.com> 
wrote:
> > Greetings,
> > I'm running version 3.3.0 on Fedora16-x86_64.  I have two bricks setup
> > with a volume doing basic replication on an XFS formatted filesystem.
> > I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to
> > write to the NFS mount point.  After a few minutes, I noticed that
> > bonnie++ didn't seem to be generating any more progress output, at
> > which point I started checking assorted logs to see if anything was
> > wrong.  At that point, I saw that the client system was no longer able
> > to write to the NFS mount point, and dmesg (and /var/log/messages) was
> > spewing these warnings like crazy (dozens/second):
> > nfs: server 10.31.99.166 not responding, still trying trying
> > 
> > Those warnings started at 14:40:58 on the client system, but oddly
> > stopped a few seconds later at 14:41:04.  Here's the full bonnie++
> > output (/mnt/gv0 is where the gluster file system is mounted as an NFS
> > client):
> > ########
> > [root at cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root
> > Using uid:0, gid:0.
> > Writing a byte at a time...done
> > Writing intelligently...done
> > Rewriting...
> > ########
> > 
> > Here's what's in the glusterfs logs at the moment:
> > ##########
> > # tail etc-glusterfs-glusterd.vol.log
> > [2012-09-18 14:54:39.026557] I
> > [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received
> > op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
> > [2012-09-18 14:54:39.029463] I
> > [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd:
> > Responded to stage, ret: 0
> > [2012-09-18 14:54:46.993426] I
> > [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume]
> > 0-management: Received heal vol req for volume gv0
> > [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock]
> > 0-glusterd: Unable to get lock for uuid:
> > e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by:
> > 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
> > [2012-09-18 14:54:46.993520] E
> > [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to
> > acquire local lock, ret: -1
> > [2012-09-18 14:55:47.175521] I
> > [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd:
> > Received get vol req
> > [2012-09-18 14:55:47.181048] I
> > [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd:
> > Received get vol req
> > [2012-09-18 14:55:49.306776] I
> > [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume]
> > 0-management: Received heal vol req for volume gv0
> > [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock]
> > 0-glusterd: Unable to get lock for uuid:
> > e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by:
> > 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
> > [2012-09-18 14:55:49.306844] E
> > [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to
> > acquire local lock, ret: -1
> > # tail -f cli.log
> > [2012-09-18 14:55:47.176824] I
> > [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to
> > get vol: 0
> > [2012-09-18 14:55:47.180959] I
> > [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0
> > [2012-09-18 14:55:47.181128] I
> > [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to
> > get vol: 0
> > [2012-09-18 14:55:47.181167] I
> > [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0
> > [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0
> > [2012-09-18 14:55:49.244795] W
> > [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing
> > 'option transport-type'. defaulting to "socket"
> > [2012-09-18 14:55:49.307054] I
> > [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to
> > heal volume
> > [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize]
> > (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365]
> > (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
> > [0x328ca0f965] (-->gluster(gf_cli3_1_heal_volume_cbk+0x1d4)
> > [0x4225e4]))) 0-dict: buf is null!
> > [2012-09-18 14:55:49.307289] E
> > [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate
> > memory
> > [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1
> > ##########
> > 
> > I'd be happy to provide more if someone requests something specific.
> > 
> > Not sure what other information to provide at this point, but here's
> > the basics of the gluster setup:
> > ##########
> > # gluster volume info all
> > 
> > Volume Name: gv0
> > Type: Replicate
> > Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c
> > Status: Started
> > Number of Bricks: 1 x 2 = 2
> > Transport-type: tcp
> > Bricks:
> > Brick1: 10.31.99.165:/mnt/sdb1
> > Brick2: 10.31.99.166:/mnt/sdb1
> > # gluster volume heal gv0 info
> > operation failed
> > ##########
> > 
> > I just noticed that glusterfs seems to be rapidly heading towards OOM
> > territory.  The glusterfs daemon is currently consuming 90% of MEM
> > according to top.
> > 
> > thanks
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
What does it say about a society that would rather send its 
children to kill and die for oil than to get on a bike?