Re: Stale locks on shards

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Tue, 23 Jan 2018 13:04:13 +0530

On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen <samppah@xxxxxxxxxxxxx> wrote:

Hi again,

here is more information regarding issue described earlier

It looks like self healing is stuck. According to "heal statistics" 
crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's 
around Sun Jan 21 20:30 when writing this). However glustershd.log says 
that last heal was completed at "2018-01-20 11:00:13.090697" (which is 
13:00 UTC+2). Also "heal info" has been running now for over 16 hours 
without any information. In statedump I can see that storage nodes have 
locks on files and some of those are blocked. Ie. Here again it says 
that ovirt8z2 is having active lock even ovirt8z2 crashed after the lock
 was granted.:

[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]

path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27

mandatory=0

inodelk-count=3

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal

inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, 
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
 granted at 2018-01-20 10:59:52

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0

inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
3420, owner=d8b9372c397f0000, client=0x7f8858410be0, 
connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
 granted at 2018-01-20 08:57:23

inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, 
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
 blocked at 2018-01-20 10:59:52

I'd also like to add that volume had arbiter brick before crash 
happened. We decided to remove it because we thought that it was causing
 issues. However now I think that this was unnecessary. After the crash 
arbiter logs had lots of messages like this:

[2018-01-20 10:19:36.515717] I [MSGID: 115072] 
[server-rpc-fops.c:1640:server_setattr_cbk] 0-zone2-ssd1-vmstor1-server:
 37374187: SETATTR <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe> 
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted) 
[Operation not permitted]

Is there anyways to force self heal to stop? Any help would be very much
 appreciated :)

Exposing .shard to a normal mount is opening a can of worms. You should probably look at mounting the volume with gfid aux-mount where you can access a file with <path-to-mount>/.gfid/<gfid-string>to clear locks on it.

Mount command:  mount -t glusterfs -o aux-gfid-mount vm1:test /mnt/testvol
A gfid string will have some hyphens like: 11118443-1894-4273-9340-4b212fa1c0e4
That said. Next disconnect on the brick where you successfully did the clear-locks will crash the brick. There was a bug in 3.8.x series with clear-locks which was fixed in 3.9.0 with a feature. The self-heal deadlocks that you witnessed also is fixed in 3.10 version of the release.

3.8.x is EOLed, so I recommend you to upgrade to a supported version soon.

Best regards,

Samuli Heinonen

   	Samuli Heinonen
        20 
January 2018 at 21.57

  Hi all!

One hypervisor on our virtualization environment crashed and now 
some of 
the VM images cannot be accessed. After investigation we found out that 
there was lots of images that still had active lock on crashed 
hypervisor. We were able to remove locks from "regular files", but it 
doesn't seem possible to remove locks from shards.

We are running GlusterFS 3.8.15 on all nodes.

Here is part of statedump that shows shard having active lock on 
crashed 
node:

[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]

path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21

mandatory=0

inodelk-count=1

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0

inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid
 = 
3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id 
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,

granted at 2018-01-20 08:57:24

If we try to run clear-locks we get following error message:

# gluster volume clear-locks zone2-ssd1-vmstor1 
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode

Volume clear-locks unsuccessful

clear-locks getxattr command failed. Reason: Operation not permitted

Gluster vol info if needed:

Volume Name: zone2-ssd1-vmstor1

Type: Replicate

Volume ID: b6319968-690b-4060-8fff-b212d2295208

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 2 = 2

Transport-type: rdma

Bricks:

Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export

Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export

Options Reconfigured:

cluster.shd-wait-qlength: 10000

cluster.shd-max-threads: 8

cluster.locking-scheme: granular

performance.low-prio-threads: 32

cluster.data-self-heal-algorithm: full

performance.client-io-threads: off

storage.linux-aio: off

performance.readdir-ahead: on

client.event-threads: 16

server.event-threads: 16

performance.strict-write-ordering: off

performance.quick-read: off

performance.read-ahead: on

performance.io-cache: off

performance.stat-prefetch: off

cluster.eager-lock: enable

network.remote-dio: on

cluster.quorum-type: none

network.ping-timeout: 22

performance.write-behind: off

nfs.disable: on

features.shard: on

features.shard-block-size: 512MB

storage.owner-uid: 36

storage.owner-gid: 36

performance.io-thread-count: 64

performance.cache-size: 2048MB

performance.write-behind-window-size: 256MB

server.allow-insecure: on

cluster.ensure-durability: off

config.transport: rdma

server.outstanding-rpc-limit: 512

diagnostics.brick-log-level: INFO

Any recommendations how to advance from here?

Best regards,

Samuli Heinonen

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Pranith

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users