Re: Stale locks on shards

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen <samppah@xxxxxxxxxxxxx> wrote:
Pranith Kumar Karampuri kirjoitti 25.01.2018 07:09:
On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
<samppah@xxxxxxxxxxxxx> wrote:

Hi!

Thank you very much for your help so far. Could you please tell an
example command how to use aux-gid-mount to remove locks? "gluster
vol clear-locks" seems to mount volume by itself.

You are correct, sorry, this was implemented around 7 years back and I
forgot that bit about it :-(. Essentially it becomes a getxattr
syscall on the file.
Could you give me the clear-locks command you were trying to execute
and I can probably convert it to the getfattr command?

I have been testing this in test environment and with command:
gluster vol clear-locks g1 /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c kind all inode

Could you do strace of glusterd when this happens? It will have a getxattr with "glusterfs.clrlk" in the key. You need to execute that on the gfid-aux-mount
 



Best regards,
Samuli Heinonen

Pranith Kumar Karampuri <mailto:pkarampu@xxxxxxxxxx>
23 January 2018 at 10.30

On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
<samppah@xxxxxxxxxxxxx <mailto:samppah@xxxxxxxxxxxxx>> wrote:

Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:

On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen

<samppah@xxxxxxxxxxxxx <mailto:samppah@xxxxxxxxxxxxx>>
wrote:

Hi again,

here is more information regarding issue described
earlier

It looks like self healing is stuck. According to
"heal
statistics"
crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on
(It's around Sun Jan 21 20:30 when writing this).
However
glustershd.log says that last heal was completed at
"2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal
info"
has been
running now for over 16 hours without any information.
In
statedump
I can see that storage nodes have locks on files and
some
of those
are blocked. Ie. Here again it says that ovirt8z2 is
having active
lock even ovirt8z2 crashed after the lock was
granted.:

[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,




connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,

granted at 2018-01-20 10:59:52

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0,
len=0, pid
= 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,

connection-id=ovirt8z2.xxx.com [1]



<http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,

granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
start=0,
len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,




connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,

blocked at 2018-01-20 10:59:52

I'd also like to add that volume had arbiter brick
before
crash
happened. We decided to remove it because we thought
that
it was
causing issues. However now I think that this was
unnecessary. After
the crash arbiter logs had lots of messages like this:
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
not
permitted)
[Operation not permitted]

Is there anyways to force self heal to stop? Any help
would be very
much appreciated :)

Exposing .shard to a normal mount is opening a can of
worms. You
should probably look at mounting the volume with gfid
aux-mount where
you can access a file with
<path-to-mount>/.gfid/<gfid-string>to clear
locks on it.

Mount command:  mount -t glusterfs -o aux-gfid-mount
vm1:test
/mnt/testvol

A gfid string will have some hyphens like:
11118443-1894-4273-9340-4b212fa1c0e4

That said. Next disconnect on the brick where you
successfully
did the
clear-locks will crash the brick. There was a bug in 3.8.x
series with
clear-locks which was fixed in 3.9.0 with a feature. The
self-heal
deadlocks that you witnessed also is fixed in 3.10 version
of the
release.

Thank you the answer. Could you please tell more about crash?
What
will actually happen or is there a bug report about it? Just
want
to make sure that we can do everything to secure data on
bricks.
We will look into upgrade but we have to make sure that new
version works for us and of course get self healing working
before
doing anything :)

Locks xlator/module maintains a list of locks that are granted to
a client. Clear locks had an issue where it forgets to remove the
lock from this list. So the connection list ends up pointing to
data that is freed in that list after a clear lock. When a
disconnect happens, all the locks that are granted to a client
need to be unlocked. So the process starts traversing through this
list and when it starts trying to access this freed data it leads
to a crash. I found it while reviewing a feature patch sent by
facebook folks to locks xlator (http://review.gluster.org/14816
[2]) for 3.9.0 and they also fixed this bug as well as part of

that feature patch.

Br,
Samuli

3.8.x is EOLed, so I recommend you to upgrade to a
supported
version
soon.

Best regards,
Samuli Heinonen

Samuli Heinonen
20 January 2018 at 21.57

Hi all!

One hypervisor on our virtualization environment
crashed and now
some of the VM images cannot be accessed. After
investigation we
found out that there was lots of images that still
had
active lock
on crashed hypervisor. We were able to remove
locks
from "regular
files", but it doesn't seem possible to remove
locks
from shards.

We are running GlusterFS 3.8.15 on all nodes.

Here is part of statedump that shows shard having
active lock on
crashed node:

[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]

path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal

lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
start=0, len=0,
pid = 3568, owner=14ce372c397f0000,
client=0x7f3198388770,
connection-id




ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,

granted at 2018-01-20 08:57:24

If we try to run clear-locks we get following
error
message:
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
kind
all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason:
Operation not
permitted

Gluster vol info if needed:
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
Options Reconfigured:
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO

Any recommendations how to advance from here?

Best regards,
Samuli Heinonen

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
<mailto:Gluster-users@gluster.org>

http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users [3]>
[1]

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
<mailto:Gluster-users@gluster.org>

http://lists.gluster.org/mailman/listinfo/gluster-users [3]

<http://lists.gluster.org/mailman/listinfo/gluster-users [3]> [1]

--

Pranith

Links:
------
[1]
http://lists.gluster.org/mailman/listinfo/gluster-users [3]
<http://lists.gluster.org/mailman/listinfo/gluster-users
[3]>


--
Pranith
Samuli Heinonen <mailto:samppah@xxxxxxxxxxxxx>
21 January 2018 at 21.03
Hi again,

here is more information regarding issue described earlier

It looks like self healing is stuck. According to "heal
statistics" crawl began at Sat Jan 20 12:56:19 2018 and it's still
going on (It's around Sun Jan 21 20:30 when writing this). However
glustershd.log says that last heal was completed at "2018-01-20
11:00:13.090697" (which is 13:00 UTC+2). Also "heal info" has been
running now for over 16 hours without any information. In
statedump I can see that storage nodes have locks on files and
some of those are blocked. Ie. Here again it says that ovirt8z2 is
having active lock even ovirt8z2 crashed after the lock was
granted.:

[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,


connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,
connection-id=ovirt8z2.xxx.com

[1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0,
pid = 18446744073709551610, owner=d0c6d857a87f0000,
client=0x7f885845efa0,


connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,
blocked at 2018-01-20 10:59:52

I'd also like to add that volume had arbiter brick before crash
happened. We decided to remove it because we thought that it was
causing issues. However now I think that this was unnecessary.
After the crash arbiter logs had lots of messages like this:
[2018-01-20 10:19:36.515717] I [MSGID: 115072]
[server-rpc-fops.c:1640:server_setattr_cbk]
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
permitted) [Operation not permitted]

Is there anyways to force self heal to stop? Any help would be
very much appreciated :)

Best regards,
Samuli Heinonen

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users [3]

Samuli Heinonen <mailto:samppah@xxxxxxxxxxxxx>

20 January 2018 at 21.57
Hi all!

One hypervisor on our virtualization environment crashed and now
some of the VM images cannot be accessed. After investigation we
found out that there was lots of images that still had active lock
on crashed hypervisor. We were able to remove locks from "regular
files", but it doesn't seem possible to remove locks from shards.

We are running GlusterFS 3.8.15 on all nodes.

Here is part of statedump that shows shard having active lock on
crashed node:
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0,
pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
connection-id


ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
granted at 2018-01-20 08:57:24

If we try to run clear-locks we get following error message:
# gluster volume clear-locks zone2-ssd1-vmstor1
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not
permitted

Gluster vol info if needed:
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
Options Reconfigured:
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO

Any recommendations how to advance from here?

Best regards,
Samuli Heinonen

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users [3]

--

Pranith


Links:
------
[1] http://ovirt8z2.xxx.com
[2] http://review.gluster.org/14816
[3] http://lists.gluster.org/mailman/listinfo/gluster-users



--
Pranith
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux