Re: [Gluster-users] 3.6.6 issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ben,

It is not still in this state. It seems to have settled out over the weekend and is behaving normally. I am guessing here, but I think the issue might have been a problem with 3.6.3 clients connecting to the 3.6.6 server. We have upgraded all of our clients to 3.6.6 and are no longer having the problem.

Is there a way to check the client version of all clients connected to a gluster server?

David


------ Original Message ------
From: "Ben Turner" <bturner@xxxxxxxxxx>
To: "David Robinson" <drobinson@xxxxxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx; "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: 10/19/2015 8:20:10 PM
Subject: Re: [Gluster-users] 3.6.6 issues

Hi David. Is the cluster still in this state? If so can you grab a couple stack traces from the offending brick (gfs01a) process with gstack? Make sure that its the brick process spinning your CPUs with top or something, we want to be sure the stack traces are from the offending process. That will give us an idea of what it is chewing on. Other than that maybe you could take a couple sosreports on the servers and open a BZ. It may be a good idea to roll back versions until we can get this sorted, I don't know how long you can have the cluster in this state. Once you get a bugzilla open I'll try to repro what you are seeing to see if this is reproducible.

-b

----- Original Message -----
 From: "David Robinson" <david.robinson@xxxxxxxxxxxxx>
To: gluster-users@xxxxxxxxxxx, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
 Sent: Saturday, October 17, 2015 12:19:36 PM
 Subject: [Gluster-users] 3.6.6 issues

I upgraded my storage server from 3.6.3 to 3.6.6 and am now having issues. My setup (4x2) is shown below. One of the bricks (gfs01a) has a very high cpu-load even though the load on the other 3-bricks (gfs01b, gfs02a, gfs02b) is almost zero. The FUSE mounted partition is extremely slow and basically unuseable since the upgrade. I am getting a lot of the messages shown below in the logs on gfs01a and gfs01b. Nothing out of the ordinary is showing up
 on the gfs02a/gfs02b bricks.
 Can someone help?
 [root@gfs01b glusterfs]# gluster volume info homegfs

 Volume Name: homegfs
 Type: Distributed-Replicate
 Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
 Status: Started
 Number of Bricks: 4 x 2 = 8
 Transport-type: tcp
 Bricks:
 Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
 Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
 Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
 Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
 Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
 Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
 Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
 Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
 Options Reconfigured:
 changelog.rollover-time: 15
 changelog.fsync-interval: 3
 changelog.changelog: on
 geo-replication.ignore-pid-check: on
 geo-replication.indexing: off
 storage.owner-gid: 100
 network.ping-timeout: 10
 server.allow-insecure: on
 performance.write-behind-window-size: 128MB
 performance.cache-size: 128MB
 performance.io-thread-count: 32
 server.manage-gids: on
 [root@ gfs01a glusterfs]# tail -f cli.log
 [2015-10-17 16:05:44.299933] I [socket.c:2353:socket_event_handler]
 0-transport: disconnecting now
[2015-10-17 16:05:44.331233] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [2015-10-17 16:06:33.397631] I [socket.c:2353:socket_event_handler]
 0-transport: disconnecting now
[2015-10-17 16:06:33.432970] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [2015-10-17 16:11:22.441290] I [socket.c:2353:socket_event_handler]
 0-transport: disconnecting now
[2015-10-17 16:11:22.472227] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [2015-10-17 16:15:44.176391] I [socket.c:2353:socket_event_handler]
 0-transport: disconnecting now
[2015-10-17 16:15:44.205064] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [2015-10-17 16:16:33.366424] I [socket.c:2353:socket_event_handler]
 0-transport: disconnecting now
[2015-10-17 16:16:33.377160] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [root@ gfs01a glusterfs]# tail etc-glusterfs-glusterd.vol.log
 [2015-10-17 15:56:33.177207] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume Source
 [2015-10-17 16:01:22.303635] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume Software
 [2015-10-17 16:05:44.320555] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume homegfs
 [2015-10-17 16:06:17.204783] W [rpcsvc.c:254:rpcsvc_program_actor]
 0-rpc-service: RPC program not available (req 1298437 330)
[2015-10-17 16:06:17.204811] E [rpcsvc.c:544:rpcsvc_check_and_reply_error]
 0-rpcsvc: rpc actor failed to complete successfully
 [2015-10-17 16:06:33.408695] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume Source
 [2015-10-17 16:11:22.462374] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume Software
[2015-10-17 16:12:30.608092] E [glusterd-op-sm.c:207:glusterd_get_txn_opinfo]
 0-: Unable to get transaction opinfo for transaction ID :
 d143b66b-2ac9-4fd9-8635-fe1eed41d56b
 [2015-10-17 16:15:44.198292] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume homegfs
 [2015-10-17 16:16:33.368170] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume Source
 [root@ gfs01b glusterfs]# tail -f glustershd.log
 [2015-10-17 16:11:45.996447] I
 [afr-self-heal-metadata.c:54:__afr_selfheal_metadata_do]
 0-homegfs-replicate-1: performing metadata selfheal on
 0a65d73a-a416-418e-92f0-5cec7d240433
[2015-10-17 16:11:46.030947] I [afr-self-heal-common.c:476:afr_log_selfheal]
 0-homegfs-replicate-1: Completed metadata selfheal on
 0a65d73a-a416-418e-92f0-5cec7d240433. source=1 sinks=0
[2015-10-17 16:11:46.031241] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-homegfs-client-3: remote operation failed: No such file or directory.
 Path: <gfid:d2714957-0c83-4ab2-8cfc-1931c8e9d0bf>
 (d2714957-0c83-4ab2-8cfc-1931c8e9d0bf)
[2015-10-17 16:11:46.031633] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-homegfs-client-3: remote operation failed: No such file or directory.
 Path: <gfid:87c5f875-c3e7-4b14-807a-4e6d940750fc>
 (87c5f875-c3e7-4b14-807a-4e6d940750fc)
[2015-10-17 16:11:47.043367] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-homegfs-client-3: remote operation failed: No such file or directory.
 Path: <gfid:d2714957-0c83-4ab2-8cfc-1931c8e9d0bf>
 (d2714957-0c83-4ab2-8cfc-1931c8e9d0bf)
[2015-10-17 16:11:47.054199] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-homegfs-client-3: remote operation failed: No such file or directory.
 Path: <gfid:87c5f875-c3e7-4b14-807a-4e6d940750fc>
 (87c5f875-c3e7-4b14-807a-4e6d940750fc)
[2015-10-17 16:12:48.001869] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-homegfs-client-3: remote operation failed: No such file or directory.
 Path: <gfid:d2714957-0c83-4ab2-8cfc-1931c8e9d0bf>
 (d2714957-0c83-4ab2-8cfc-1931c8e9d0bf)
[2015-10-17 16:12:48.012671] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-homegfs-client-3: remote operation failed: No such file or directory.
 Path: <gfid:87c5f875-c3e7-4b14-807a-4e6d940750fc>
 (87c5f875-c3e7-4b14-807a-4e6d940750fc)
[2015-10-17 16:13:49.011591] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-homegfs-client-3: remote operation failed: No such file or directory.
 Path: <gfid:d2714957-0c83-4ab2-8cfc-1931c8e9d0bf>
 (d2714957-0c83-4ab2-8cfc-1931c8e9d0bf)
[2015-10-17 16:13:49.018600] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-homegfs-client-3: remote operation failed: No such file or directory.
 Path: <gfid:87c5f875-c3e7-4b14-807a-4e6d940750fc>
 (87c5f875-c3e7-4b14-807a-4e6d940750fc)
 [root@ gfs01b glusterfs]# tail cli.log
[2015-10-16 10:52:16.002922] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [2015-10-16 10:52:16.167432] I [socket.c:2353:socket_event_handler]
 0-transport: disconnecting now
[2015-10-16 10:52:18.248024] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [2015-10-17 16:12:30.607603] I [socket.c:2353:socket_event_handler]
 0-transport: disconnecting now
[2015-10-17 16:12:30.628810] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [2015-10-17 16:12:33.992818] I [socket.c:2353:socket_event_handler]
 0-transport: disconnecting now
[2015-10-17 16:12:33.998944] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [2015-10-17 16:12:38.604461] I [socket.c:2353:socket_event_handler]
 0-transport: disconnecting now
[2015-10-17 16:12:38.605532] I [cli-rpc-ops.c:588:gf_cli_get_volume_cbk]
 0-cli: Received resp to get vol: 0
[2015-10-17 16:12:38.605659] I [input.c:36:cli_batch] 0-: Exiting with: 0
 [root@ gfs01b glusterfs]# tail etc-glusterfs-glusterd.vol.log
 [2015-10-16 14:29:56.495120] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
0-rpc-service: Request received from non-privileged port. Failing request
 [2015-10-16 14:29:59.369109] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
0-rpc-service: Request received from non-privileged port. Failing request
 [2015-10-16 14:29:59.512093] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
0-rpc-service: Request received from non-privileged port. Failing request
 [2015-10-16 14:30:02.383574] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
0-rpc-service: Request received from non-privileged port. Failing request
 [2015-10-16 14:30:02.529206] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
0-rpc-service: Request received from non-privileged port. Failing request
 [2015-10-16 16:01:20.389100] E [rpcsvc.c:617:rpcsvc_handle_rpc_call]
0-rpc-service: Request received from non-privileged port. Failing request
 [2015-10-17 16:12:30.611161] W
[glusterd-op-sm.c:4066:glusterd_op_modify_op_ctx] 0-management: op_ctx
 modification failed
 [2015-10-17 16:12:30.612433] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume Software
 [2015-10-17 16:12:30.618444] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume Source
 [2015-10-17 16:12:30.624005] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume homegfs
 [2015-10-17 16:12:33.993869] I
[glusterd-handler.c:3836:__glusterd_handle_status_volume] 0-management:
 Received status volume req for volume homegfs
 [2015-10-17 16:12:38.605389] I
[glusterd-handler.c:1296:__glusterd_handle_cli_get_volume] 0-glusterd:
 Received get vol req
 [root@gfs01b glusterfs]# gluster volume status homegfs
 Status of volume: homegfs
 Gluster process Port Online Pid
------------------------------------------------------------------------------
 Brick gfsib01a.corvidtec.com:/data/brick01a/homegfs 49152 Y 3820
 Brick gfsib01b.corvidtec.com:/data/brick01b/homegfs 49152 Y 3808
 Brick gfsib01a.corvidtec.com:/data/brick02a/homegfs 49153 Y 3825
 Brick gfsib01b.corvidtec.com:/data/brick02b/homegfs 49153 Y 3813
 Brick gfsib02a.corvidtec.com:/data/brick01a/homegfs 49152 Y 3967
 Brick gfsib02b.corvidtec.com:/data/brick01b/homegfs 49152 Y 3952
 Brick gfsib02a.corvidtec.com:/data/brick02a/homegfs 49153 Y 3972
 Brick gfsib02b.corvidtec.com:/data/brick02b/homegfs 49153 Y 3957
 NFS Server on localhost 2049 Y 3822
 Self-heal Daemon on localhost N/A Y 3827
 NFS Server on 10.200.70.1 2049 Y 3834
 Self-heal Daemon on 10.200.70.1 N/A Y 3839
 NFS Server on gfsib02a.corvidtec.com 2049 Y 3981
 Self-heal Daemon on gfsib02a.corvidtec.com N/A Y 3986
 NFS Server on gfsib02b.corvidtec.com 2049 Y 3966
 Self-heal Daemon on gfsib02b.corvidtec.com N/A Y 3971

 Task Status of Volume homegfs
------------------------------------------------------------------------------
 Task : Rebalance
 ID : 58b6cc76-c29c-4695-93fe-c42b1112e171
 Status : completed



 ========================



 David F. Robinson, Ph.D.

 President - Corvid Technologies

 145 Overhill Drive

 Mooresville, NC 28117

 704.799.6944 x101 [Office]

 704.252.1310 [Cell]

 704.799.7974 [Fax]

 d avid.robinson@xxxxxxxxxxxxx

 http://www.corvidtec.com

 _______________________________________________
 Gluster-users mailing list
 Gluster-users@xxxxxxxxxxx
 http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux