Re: locking problem on 3.8.5

Bernhard Duebi <bernhard@xxxxxx> · Thu, 20 Oct 2016 17:44:38 +0200

Hi Atin,

the node with the UUID=bb12d7e7-ded5-4d32-b294-8f5011f70afb is one of the nodes I sent the logs

root@chglbcvtprd04:~# cat /var/lib/glusterd/glusterd.info
UUID=bb12d7e7-ded5-4d32-b294-8f5011f70afb
operating-version=30703

below the info you requested, but I don't know how relevant it is as the machines have been rebooted since my last mail

root@chastcvtprd04:~# gluster peer status
Number of Peers: 1

Hostname: chglbcvtprd04
Uuid: bb12d7e7-ded5-4d32-b294-8f5011f70afb
State: Peer in Cluster (Connected)
Other names:
chglbcvtprd04.fpprod.corp
root@chastcvtprd04:~# glusterd statedump
root@chastcvtprd04:~# cat /var/log/glusterfs/statedump.log
[2016-10-20 14:57:04.636547] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-glusterd: Started running glusterd version 3.8.5 (args: glusterd statedump)
[2016-10-20 14:57:04.636599] E [MSGID: 100007] [glusterfsd.c:578:create_fuse_mount] 0-glusterfsd: Not a client process, not performing mount operation
root@chastcvtprd04:~#

root@chglbcvtprd04:~# gluster peer status
Number of Peers: 1

Hostname: chastcvtprd04.fpprod.corp
Uuid: 82aef154-8444-46bb-9fd5-d7eaf4f0a6bc
State: Peer in Cluster (Connected)
Other names:
chastcvtprd04
root@chglbcvtprd04:~# glusterd statedump
root@chglbcvtprd04:~# cat /var/log/glusterfs/statedump.log
[2016-10-20 14:59:21.047747] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-glusterd: Started running glusterd version 3.8.5 (args: glusterd statedump)
[2016-10-20 14:59:21.047785] E [MSGID: 100007] [glusterfsd.c:578:create_fuse_mount] 0-glusterfsd: Not a client process, not performing mount operation
root@chglbcvtprd04:~#

Hope this helps
Let me know if you need more info

Best Regards
Bernhard

On Don, 2016-10-20 at 10:02 +0530, Atin Mukherjee wrote:
I had a chance to look at the logs from both the nodes. I could see a repetitive instance of the following logs (in both the nodes):

"Lock for Oracle_Legal_04 held by bb12d7e7-ded5-4d32-b294-8f5011f70afb"

What that means is node with UUID is the culprit. However I doubt that the logs you shared from the nodes have this UUID. Could you please help me with gluster peer status output? If you happen to find which node is having the UUID mentioned, could you take glusterd statedump and share it with us?

On Tue, Oct 18, 2016 at 2:11 PM, Bernhard Duebi <bernhard@xxxxxx> wrote:
Hello,

I'm running gluster 3.8.5 on Ubuntu 16.04. I have 2 nodes which mirror

each other. There are 32 volumes and all have the same configuration:

Type: Replicate

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: node01:/data/glusterfs/vol/disk/brick

Brick2: node02:/data/glusterfs/vol/disk/brick

Options Reconfigured:

diagn

ostics.count-fop-hits: on

diagnostics.latency-measurement: on

performance

.readdir-ahead: on

nfs.disable: on

auth.allow:

127.0.0.1,10.11.12.21,10.11.12.22

Nagios runs every 5 mins for each volume

# gluster volume heal $vol info

# gluster volume status $vol detail

Diamond runs every minute

# gluster volume list

and then for every volume

# gluster volume profile $vol info cumulative --xml

this was running fine with Gluster 3.7 but since I upgraded to 3.8.5 I

see a lot of problems with locking. After a reboot of both machines

everything is fine. But after a while gluster volume status gives me

the following error:

Another transaction is in progress for $vol. Please try again after

sometime

The problem is, that the system never recovers, only rebooting the

machines helps. Ok, probably a restart of gluster would do to.

I attached the logfiles from both glusterd. Let me know if you need

more information.

Thanks

Bernhard

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users