Re: Stale locks on shards

Samuli Heinonen <samppah@xxxxxxxxxxxxx> · Wed, 24 Jan 2018 22:57:03 +0200

Hi!

Thank you very much for your help so far. Could you please tell an 
example command how to use aux-gid-mount to remove locks? "gluster vol 
clear-locks" seems to mount volume by itself.

Best regards,
Samuli Heinonen

Pranith Kumar Karampuri <mailto:pkarampu@xxxxxxxxxx>
23 January 2018 at 10.30

On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen 
<samppah@xxxxxxxxxxxxx <mailto:samppah@xxxxxxxxxxxxx>> wrote:

    Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:

        On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen
        <samppah@xxxxxxxxxxxxx <mailto:samppah@xxxxxxxxxxxxx>> wrote:

            Hi again,

            here is more information regarding issue described earlier

            It looks like self healing is stuck. According to "heal
            statistics"
            crawl began at Sat Jan 20 12:56:19 2018 and it's still
            going on
            (It's around Sun Jan 21 20:30 when writing this). However
            glustershd.log says that last heal was completed at
            "2018-01-20
            11:00:13.090697" (which is 13:00 UTC+2). Also "heal info"
            has been
            running now for over 16 hours without any information. In
            statedump
            I can see that storage nodes have locks on files and some
            of those
            are blocked. Ie. Here again it says that ovirt8z2 is
            having active
            lock even ovirt8z2 crashed after the lock was granted.:

            [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
            path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
            mandatory=0
            inodelk-count=3
            lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
            inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
            len=0, pid
            = 18446744073709551610, owner=d0c6d857a87f0000,
            client=0x7f885845efa0,

        connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,

            granted at 2018-01-20 10:59:52
            lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
            lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
            inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
            len=0, pid
            = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,

        connection-id=ovirt8z2.xxx.com
        <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,

            granted at 2018-01-20 08:57:23
            inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0,
            len=0,
            pid = 18446744073709551610, owner=d0c6d857a87f0000,
            client=0x7f885845efa0,

        connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,

            blocked at 2018-01-20 10:59:52

            I'd also like to add that volume had arbiter brick before
            crash
            happened. We decided to remove it because we thought that
            it was
            causing issues. However now I think that this was
            unnecessary. After
            the crash arbiter logs had lots of messages like this:
            [2018-01-20 10:19:36.515717] I [MSGID: 115072]
            [server-rpc-fops.c:1640:server_setattr_cbk]
            0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
            <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
            (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not
            permitted)
            [Operation not permitted]

            Is there anyways to force self heal to stop? Any help
            would be very
            much appreciated :)

        Exposing .shard to a normal mount is opening a can of worms. You
        should probably look at mounting the volume with gfid
        aux-mount where
        you can access a file with
        <path-to-mount>/.gfid/<gfid-string>to clear
        locks on it.

        Mount command:  mount -t glusterfs -o aux-gfid-mount vm1:test
        /mnt/testvol

        A gfid string will have some hyphens like:
        11118443-1894-4273-9340-4b212fa1c0e4

        That said. Next disconnect on the brick where you successfully
        did the
        clear-locks will crash the brick. There was a bug in 3.8.x
        series with
        clear-locks which was fixed in 3.9.0 with a feature. The self-heal
        deadlocks that you witnessed also is fixed in 3.10 version of the
        release.

    Thank you the answer. Could you please tell more about crash? What
    will actually happen or is there a bug report about it? Just want
    to make sure that we can do everything to secure data on bricks.
    We will look into upgrade but we have to make sure that new
    version works for us and of course get self healing working before
    doing anything :)

Locks xlator/module maintains a list of locks that are granted to a 
client. Clear locks had an issue where it forgets to remove the lock 
from this list. So the connection list ends up pointing to data that 
is freed in that list after a clear lock. When a disconnect happens, 
all the locks that are granted to a client need to be unlocked. So the 
process starts traversing through this list and when it starts trying 
to access this freed data it leads to a crash. I found it while 
reviewing a feature patch sent by facebook folks to locks xlator 
(http://review.gluster.org/14816) for 3.9.0 and they also fixed this 
bug as well as part of that feature patch.

    Br,
    Samuli

        3.8.x is EOLed, so I recommend you to upgrade to a supported
        version
        soon.

            Best regards,
            Samuli Heinonen

                Samuli Heinonen
                20 January 2018 at 21.57

                Hi all!

                One hypervisor on our virtualization environment
                crashed and now
                some of the VM images cannot be accessed. After
                investigation we
                found out that there was lots of images that still had
                active lock
                on crashed hypervisor. We were able to remove locks
                from "regular
                files", but it doesn't seem possible to remove locks
                from shards.

                We are running GlusterFS 3.8.15 on all nodes.

                Here is part of statedump that shows shard having
                active lock on
                crashed node:
                [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
                path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
                mandatory=0
                inodelk-count=1
                lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
                lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
                lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
                inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
                start=0, len=0,
                pid = 3568, owner=14ce372c397f0000, client=0x7f3198388770,
                connection-id

        ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,

                granted at 2018-01-20 08:57:24

                If we try to run clear-locks we get following error
                message:
                # gluster volume clear-locks zone2-ssd1-vmstor1
                /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
                all inode
                Volume clear-locks unsuccessful
                clear-locks getxattr command failed. Reason: Operation not
                permitted

                Gluster vol info if needed:
                Volume Name: zone2-ssd1-vmstor1
                Type: Replicate
                Volume ID: b6319968-690b-4060-8fff-b212d2295208
                Status: Started
                Snapshot Count: 0
                Number of Bricks: 1 x 2 = 2
                Transport-type: rdma
                Bricks:
                Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
                Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
                Options Reconfigured:
                cluster.shd-wait-qlength: 10000
                cluster.shd-max-threads: 8
                cluster.locking-scheme: granular
                performance.low-prio-threads: 32
                cluster.data-self-heal-algorithm: full
                performance.client-io-threads: off
                storage.linux-aio: off
                performance.readdir-ahead: on
                client.event-threads: 16
                server.event-threads: 16
                performance.strict-write-ordering: off
                performance.quick-read: off
                performance.read-ahead: on
                performance.io-cache: off
                performance.stat-prefetch: off
                cluster.eager-lock: enable
                network.remote-dio: on
                cluster.quorum-type: none
                network.ping-timeout: 22
                performance.write-behind: off
                nfs.disable: on
                features.shard: on
                features.shard-block-size: 512MB
                storage.owner-uid: 36
                storage.owner-gid: 36
                performance.io-thread-count: 64
                performance.cache-size: 2048MB
                performance.write-behind-window-size: 256MB
                server.allow-insecure: on
                cluster.ensure-durability: off
                config.transport: rdma
                server.outstanding-rpc-limit: 512
                diagnostics.brick-log-level: INFO

                Any recommendations how to advance from here?

                Best regards,
                Samuli Heinonen

                _______________________________________________
                Gluster-users mailing list
                Gluster-users@xxxxxxxxxxx
                <mailto:Gluster-users@xxxxxxxxxxx>
                http://lists.gluster.org/mailman/listinfo/gluster-users <http://lists.gluster.org/mailman/listinfo/gluster-users>
                [1]

            _______________________________________________
            Gluster-users mailing list
            Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
            http://lists.gluster.org/mailman/listinfo/gluster-users
            <http://lists.gluster.org/mailman/listinfo/gluster-users> [1]

        --

        Pranith

        Links:
        ------
        [1] http://lists.gluster.org/mailman/listinfo/gluster-users
        <http://lists.gluster.org/mailman/listinfo/gluster-users>

--
Pranith
Samuli Heinonen <mailto:samppah@xxxxxxxxxxxxx>
21 January 2018 at 21.03
Hi again,

here is more information regarding issue described earlier

It looks like self healing is stuck. According to "heal statistics" 
crawl began at Sat Jan 20 12:56:19 2018 and it's still going on (It's 
around Sun Jan 21 20:30 when writing this). However glustershd.log 
says that last heal was completed at "2018-01-20 11:00:13.090697" 
(which is 13:00 UTC+2). Also "heal info" has been running now for over 
16 hours without any information. In statedump I can see that storage 
nodes have locks on files and some of those are blocked. Ie. Here 
again it says that ovirt8z2 is having active lock even ovirt8z2 
crashed after the lock was granted.:

[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
mandatory=0
inodelk-count=3
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, 
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, 
granted at 2018-01-20 10:59:52
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
3420, owner=d8b9372c397f0000, client=0x7f8858410be0, 
connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0, 
granted at 2018-01-20 08:57:23
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
= 18446744073709551610, owner=d0c6d857a87f0000, client=0x7f885845efa0, 
connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0, 
blocked at 2018-01-20 10:59:52

I'd also like to add that volume had arbiter brick before crash 
happened. We decided to remove it because we thought that it was 
causing issues. However now I think that this was unnecessary. After 
the crash arbiter logs had lots of messages like this:
[2018-01-20 10:19:36.515717] I [MSGID: 115072] 
[server-rpc-fops.c:1640:server_setattr_cbk] 
0-zone2-ssd1-vmstor1-server: 37374187: SETATTR 
<gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe> 
(a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation not permitted) 
[Operation not permitted]

Is there anyways to force self heal to stop? Any help would be very 
much appreciated :)

Best regards,
Samuli Heinonen

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
Samuli Heinonen <mailto:samppah@xxxxxxxxxxxxx>
20 January 2018 at 21.57
Hi all!

One hypervisor on our virtualization environment crashed and now some 
of the VM images cannot be accessed. After investigation we found out 
that there was lots of images that still had active lock on crashed 
hypervisor. We were able to remove locks from "regular files", but it 
doesn't seem possible to remove locks from shards.

We are running GlusterFS 3.8.15 on all nodes.

Here is part of statedump that shows shard having active lock on 
crashed node:
[xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
mandatory=0
inodelk-count=1
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
3568, owner=14ce372c397f0000, client=0x7f3198388770, connection-id 
ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0, 
granted at 2018-01-20 08:57:24

If we try to run clear-locks we get following error message:
# gluster volume clear-locks zone2-ssd1-vmstor1 
/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind all inode
Volume clear-locks unsuccessful
clear-locks getxattr command failed. Reason: Operation not permitted

Gluster vol info if needed:
Volume Name: zone2-ssd1-vmstor1
Type: Replicate
Volume ID: b6319968-690b-4060-8fff-b212d2295208
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: rdma
Bricks:
Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
Options Reconfigured:
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
performance.client-io-threads: off
storage.linux-aio: off
performance.readdir-ahead: on
client.event-threads: 16
server.event-threads: 16
performance.strict-write-ordering: off
performance.quick-read: off
performance.read-ahead: on
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: on
cluster.quorum-type: none
network.ping-timeout: 22
performance.write-behind: off
nfs.disable: on
features.shard: on
features.shard-block-size: 512MB
storage.owner-uid: 36
storage.owner-gid: 36
performance.io-thread-count: 64
performance.cache-size: 2048MB
performance.write-behind-window-size: 256MB
server.allow-insecure: on
cluster.ensure-durability: off
config.transport: rdma
server.outstanding-rpc-limit: 512
diagnostics.brick-log-level: INFO

Any recommendations how to advance from here?

Best regards,
Samuli Heinonen

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users