Re: Stale locks on shards

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 29 Jan 2018 10:50 am, "Samuli Heinonen" <samppah@xxxxxxxxxxxxx> wrote:
Hi!

Yes, thank you for asking. I found out this line in the production environment:
lgetxattr("/tmp/zone2-ssd1-vmstor1.s6jvPu//.shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32", "glusterfs.clrlk.tinode.kblocked", 0x7f2d7c4379f0, 4096) = -1 EPERM (Operation not permitted)

I was expecting .kall instead of .blocked,
did you change the cli to kind blocked?


And this one in test environment (with posix locks):
lgetxattr("/tmp/g1.gHj4Bw//file38", "glusterfs.clrlk.tposix.kblocked", "box1:/gluster/1/export/: posix blocked locks=1 granted locks=0", 4096) = 77

In test environment I tried running following command which seemed to release gluster locks:

getfattr -n glusterfs.clrlk.tposix.kblocked file38

So I think it would go like this in production environment with locks on shards (using aux-gfid-mount mount option):
getfattr -n glusterfs.clrlk.tinode.kall .shard/f349ffbd-a423-4fb2-b83c-2d1d5e78e1fb.32

I haven't been able to try this out in production environment yet.

Is there perhaps something else to notice?

Would you be able to tell more about bricks crashing after releasing locks? Under what circumstances that does happen? Is it only process exporting the brick crashes or is there a possibility of data corruption?

No data corruption. Brick process where you did clear-locks may crash.


Best regards,
Samuli Heinonen


Pranith Kumar Karampuri wrote:
Hi,
      Did you find the command from strace?

On 25 Jan 2018 1:52 pm, "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx
<mailto:pkarampu@xxxxxxxxxx>> wrote:



    On Thu, Jan 25, 2018 at 1:49 PM, Samuli Heinonen
    <samppah@xxxxxxxxxxxxx <mailto:samppah@xxxxxxxxxxxxx>> wrote:

        Pranith Kumar Karampuri kirjoitti 25.01.2018 07:09:

            On Thu, Jan 25, 2018 at 2:27 AM, Samuli Heinonen
            <samppah@xxxxxxxxxxxxx <mailto:samppah@xxxxxxxxxxxxx>> wrote:

                Hi!

                Thank you very much for your help so far. Could you
                please tell an
                example command how to use aux-gid-mount to remove
                locks? "gluster
                vol clear-locks" seems to mount volume by itself.


            You are correct, sorry, this was implemented around 7 years
            back and I
            forgot that bit about it :-(. Essentially it becomes a getxattr
            syscall on the file.
            Could you give me the clear-locks command you were trying to
            execute
            and I can probably convert it to the getfattr command?


        I have been testing this in test environment and with command:
        gluster vol clear-locks g1
        /.gfid/14341ccb-df7b-4f92-90d5-7814431c5a1c kind all inode


    Could you do strace of glusterd when this happens? It will have a
    getxattr with "glusterfs.clrlk" in the key. You need to execute that
    on the gfid-aux-mount




                Best regards,
                Samuli Heinonen

                    Pranith Kumar Karampuri <mailto:pkarampu@xxxxxxxxxx
                    <mailto:pkarampu@xxxxxxxxxx>>
                    23 January 2018 at 10.30

                    On Tue, Jan 23, 2018 at 1:38 PM, Samuli Heinonen
                    <samppah@xxxxxxxxxxxxx
                    <mailto:samppah@xxxxxxxxxxxxx>
                    <mailto:samppah@xxxxxxxxxxxxx

                    <mailto:samppah@xxxxxxxxxxxxx>>> wrote:

                    Pranith Kumar Karampuri kirjoitti 23.01.2018 09:34:

                    On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen

                    <samppah@xxxxxxxxxxxxx
                    <mailto:samppah@xxxxxxxxxxxxx>
                    <mailto:samppah@xxxxxxxxxxxxx

                    <mailto:samppah@xxxxxxxxxxxxx>>>
                    wrote:

                    Hi again,

                    here is more information regarding issue described
                    earlier

                    It looks like self healing is stuck. According to
                    "heal
                    statistics"
                    crawl began at Sat Jan 20 12:56:19 2018 and it's still
                    going on
                    (It's around Sun Jan 21 20:30 when writing this).
                    However
                    glustershd.log says that last heal was completed at
                    "2018-01-20
                    11:00:13.090697" (which is 13:00 UTC+2). Also "heal
                    info"
                    has been
                    running now for over 16 hours without any information.
                    In
                    statedump
                    I can see that storage nodes have locks on files and
                    some
                    of those
                    are blocked. Ie. Here again it says that ovirt8z2 is
                    having active
                    lock even ovirt8z2 crashed after the lock was
                    granted.:

                    [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
                    path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
                    mandatory=0
                    inodelk-count=3

                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
                    inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
                    start=0,
                    len=0, pid
                    = 18446744073709551610, owner=d0c6d857a87f0000,
                    client=0x7f885845efa0,




            connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,


                    granted at 2018-01-20 10:59:52

                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
                    inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
                    start=0,
                    len=0, pid
                    = 3420, owner=d8b9372c397f0000, client=0x7f8858410be0,

                    connection-id=ovirt8z2.xxx.com
                    <http://ovirt8z2.xxx.com> [1]




            <http://ovirt8z2.xxx.com>-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,


                    granted at 2018-01-20 08:57:23
                    inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
                    start=0,
                    len=0,
                    pid = 18446744073709551610, owner=d0c6d857a87f0000,
                    client=0x7f885845efa0,




            connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,


                    blocked at 2018-01-20 10:59:52

                    I'd also like to add that volume had arbiter brick
                    before
                    crash
                    happened. We decided to remove it because we thought
                    that
                    it was
                    causing issues. However now I think that this was
                    unnecessary. After
                    the crash arbiter logs had lots of messages like this:
                    [2018-01-20 10:19:36.515717] I [MSGID: 115072]
                    [server-rpc-fops.c:1640:server_setattr_cbk]
                    0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
                    <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
                    (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==> (Operation
                    not
                    permitted)
                    [Operation not permitted]

                    Is there anyways to force self heal to stop? Any help
                    would be very
                    much appreciated :)

                    Exposing .shard to a normal mount is opening a can of
                    worms. You
                    should probably look at mounting the volume with gfid
                    aux-mount where
                    you can access a file with
                    <path-to-mount>/.gfid/<gfid-string>to clear
                    locks on it.

                    Mount command:  mount -t glusterfs -o aux-gfid-mount
                    vm1:test
                    /mnt/testvol

                    A gfid string will have some hyphens like:
                    11118443-1894-4273-9340-4b212fa1c0e4

                    That said. Next disconnect on the brick where you
                    successfully
                    did the
                    clear-locks will crash the brick. There was a bug in
                    3.8.x
                    series with
                    clear-locks which was fixed in 3.9.0 with a feature. The
                    self-heal
                    deadlocks that you witnessed also is fixed in 3.10
                    version
                    of the
                    release.

                    Thank you the answer. Could you please tell more
                    about crash?
                    What
                    will actually happen or is there a bug report about
                    it? Just
                    want
                    to make sure that we can do everything to secure data on
                    bricks.
                    We will look into upgrade but we have to make sure
                    that new
                    version works for us and of course get self healing
                    working
                    before
                    doing anything :)

                    Locks xlator/module maintains a list of locks that
                    are granted to
                    a client. Clear locks had an issue where it forgets
                    to remove the
                    lock from this list. So the connection list ends up
                    pointing to
                    data that is freed in that list after a clear lock.
                    When a
                    disconnect happens, all the locks that are granted
                    to a client
                    need to be unlocked. So the process starts
                    traversing through this
                    list and when it starts trying to access this freed
                    data it leads
                    to a crash. I found it while reviewing a feature
                    patch sent by
                    facebook folks to locks xlator
                    (http://review.gluster.org/14816
                    <http://review.gluster.org/14816>
                    [2]) for 3.9.0 and they also fixed this bug as well
                    as part of

                    that feature patch.

                    Br,
                    Samuli

                    3.8.x is EOLed, so I recommend you to upgrade to a
                    supported
                    version
                    soon.

                    Best regards,
                    Samuli Heinonen

                    Samuli Heinonen
                    20 January 2018 at 21.57

                    Hi all!

                    One hypervisor on our virtualization environment
                    crashed and now
                    some of the VM images cannot be accessed. After
                    investigation we
                    found out that there was lots of images that still
                    had
                    active lock
                    on crashed hypervisor. We were able to remove
                    locks
                    from "regular
                    files", but it doesn't seem possible to remove
                    locks
                    from shards.

                    We are running GlusterFS 3.8.15 on all nodes.

                    Here is part of statedump that shows shard having
                    active lock on
                    crashed node:

                    [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]

                    path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
                    mandatory=0
                    inodelk-count=1

                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata

                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal

                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
                    inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
                    start=0, len=0,
                    pid = 3568, owner=14ce372c397f0000,
                    client=0x7f3198388770,
                    connection-id




            ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,


                    granted at 2018-01-20 08:57:24

                    If we try to run clear-locks we get following
                    error
                    message:
                    # gluster volume clear-locks zone2-ssd1-vmstor1
                    /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
                    kind
                    all inode
                    Volume clear-locks unsuccessful
                    clear-locks getxattr command failed. Reason:
                    Operation not
                    permitted

                    Gluster vol info if needed:
                    Volume Name: zone2-ssd1-vmstor1
                    Type: Replicate
                    Volume ID: b6319968-690b-4060-8fff-b212d2295208
                    Status: Started
                    Snapshot Count: 0
                    Number of Bricks: 1 x 2 = 2
                    Transport-type: rdma
                    Bricks:
                    Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
                    Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
                    Options Reconfigured:
                    cluster.shd-wait-qlength: 10000
                    cluster.shd-max-threads: 8
                    cluster.locking-scheme: granular
                    performance.low-prio-threads: 32
                    cluster.data-self-heal-algorithm: full
                    performance.client-io-threads: off
                    storage.linux-aio: off
                    performance.readdir-ahead: on
                    client.event-threads: 16
                    server.event-threads: 16
                    performance.strict-write-ordering: off
                    performance.quick-read: off
                    performance.read-ahead: on
                    performance.io-cache: off
                    performance.stat-prefetch: off
                    cluster.eager-lock: enable
                    network.remote-dio: on
                    cluster.quorum-type: none
                    network.ping-timeout: 22
                    performance.write-behind: off
                    nfs.disable: on
                    features.shard: on
                    features.shard-block-size: 512MB
                    storage.owner-uid: 36
                    storage.owner-gid: 36
                    performance.io-thread-count: 64
                    performance.cache-size: 2048MB
                    performance.write-behind-window-size: 256MB
                    server.allow-insecure: on
                    cluster.ensure-durability: off
                    config.transport: rdma
                    server.outstanding-rpc-limit: 512
                    diagnostics.brick-log-level: INFO

                    Any recommendations how to advance from here?

                    Best regards,
                    Samuli Heinonen

                    _______________________________________________
                    Gluster-users mailing list
                    Gluster-users@xxxxxxxxxxx
                    <mailto:Gluster-users@gluster.org>
                    <mailto:Gluster-users@gluster.org

                    <mailto:Gluster-users@gluster.org>>

                    http://lists.gluster.org/mailman/listinfo/gluster-users
                    <http://lists.gluster.org/mailman/listinfo/gluster-users>
                    [3]
                    <http://lists.gluster.org/mailman/listinfo/gluster-users
                    <http://lists.gluster.org/mailman/listinfo/gluster-users>
                    [3]>
                    [1]

                    _______________________________________________
                    Gluster-users mailing list
                    Gluster-users@xxxxxxxxxxx
                    <mailto:Gluster-users@gluster.org>
                    <mailto:Gluster-users@gluster.org

                    <mailto:Gluster-users@gluster.org>>

                    http://lists.gluster.org/mailman/listinfo/gluster-users
                    <http://lists.gluster.org/mailman/listinfo/gluster-users>
                    [3]

                    <http://lists.gluster.org/mailman/listinfo/gluster-users
                    <http://lists.gluster.org/mailman/listinfo/gluster-users>
                    [3]> [1]

                    --

                    Pranith

                    Links:
                    ------
                    [1]
                    http://lists.gluster.org/mailman/listinfo/gluster-users
                    <http://lists.gluster.org/mailman/listinfo/gluster-users>
                    [3]
                    <http://lists.gluster.org/mailman/listinfo/gluster-users
                    <http://lists.gluster.org/mailman/listinfo/gluster-users>
                    [3]>


                    --
                    Pranith
                    Samuli Heinonen <mailto:samppah@xxxxxxxxxxxxx
                    <mailto:samppah@xxxxxxxxxxxxx>>
                    21 January 2018 at 21.03
                    Hi again,

                    here is more information regarding issue described
                    earlier

                    It looks like self healing is stuck. According to "heal
                    statistics" crawl began at Sat Jan 20 12:56:19 2018
                    and it's still
                    going on (It's around Sun Jan 21 20:30 when writing
                    this). However
                    glustershd.log says that last heal was completed at
                    "2018-01-20
                    11:00:13.090697" (which is 13:00 UTC+2). Also "heal
                    info" has been
                    running now for over 16 hours without any
                    information. In
                    statedump I can see that storage nodes have locks on
                    files and
                    some of those are blocked. Ie. Here again it says
                    that ovirt8z2 is
                    having active lock even ovirt8z2 crashed after the
                    lock was
                    granted.:

                    [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
                    path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
                    mandatory=0
                    inodelk-count=3
                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
                    inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
                    start=0, len=0,
                    pid = 18446744073709551610, owner=d0c6d857a87f0000,
                    client=0x7f885845efa0,


            connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,

                    granted at 2018-01-20 10:59:52
                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
                    inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
                    start=0, len=0,
                    pid = 3420, owner=d8b9372c397f0000,
                    client=0x7f8858410be0,
                    connection-id=ovirt8z2.xxx.com <http://ovirt8z2.xxx.com>


                [1]-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-0-7-0,

                    granted at 2018-01-20 08:57:23
                    inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0,
                    start=0, len=0,
                    pid = 18446744073709551610, owner=d0c6d857a87f0000,
                    client=0x7f885845efa0,


            connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:649541-zone2-ssd1-vmstor1-client-0-0-0,

                    blocked at 2018-01-20 10:59:52

                    I'd also like to add that volume had arbiter brick
                    before crash
                    happened. We decided to remove it because we thought
                    that it was
                    causing issues. However now I think that this was
                    unnecessary.
                    After the crash arbiter logs had lots of messages
                    like this:
                    [2018-01-20 10:19:36.515717] I [MSGID: 115072]
                    [server-rpc-fops.c:1640:server_setattr_cbk]
                    0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
                    <gfid:a52055bd-e2e9-42dd-92a3-e96b693bcafe>
                    (a52055bd-e2e9-42dd-92a3-e96b693bcafe) ==>
                    (Operation not
                    permitted) [Operation not permitted]

                    Is there anyways to force self heal to stop? Any
                    help would be
                    very much appreciated :)

                    Best regards,
                    Samuli Heinonen

                    _______________________________________________
                    Gluster-users mailing list
                    Gluster-users@xxxxxxxxxxx
                    <mailto:Gluster-users@gluster.org>
                    http://lists.gluster.org/mailman/listinfo/gluster-users

                    <http://lists.gluster.org/mailman/listinfo/gluster-users>
                    [3]

                    Samuli Heinonen <mailto:samppah@xxxxxxxxxxxxx
                    <mailto:samppah@xxxxxxxxxxxxx>>

                    20 January 2018 at 21.57
                    Hi all!

                    One hypervisor on our virtualization environment
                    crashed and now
                    some of the VM images cannot be accessed. After
                    investigation we
                    found out that there was lots of images that still
                    had active lock
                    on crashed hypervisor. We were able to remove locks
                    from "regular
                    files", but it doesn't seem possible to remove locks
                    from shards.

                    We are running GlusterFS 3.8.15 on all nodes.

                    Here is part of statedump that shows shard having
                    active lock on
                    crashed node:
                    [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
                    path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
                    mandatory=0
                    inodelk-count=1
                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
                    lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
                    inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
                    start=0, len=0,
                    pid = 3568, owner=14ce372c397f0000,
                    client=0x7f3198388770,
                    connection-id


            ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,

                    granted at 2018-01-20 08:57:24

                    If we try to run clear-locks we get following error
                    message:
                    # gluster volume clear-locks zone2-ssd1-vmstor1
                    /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21 kind
                    all inode
                    Volume clear-locks unsuccessful
                    clear-locks getxattr command failed. Reason:
                    Operation not
                    permitted

                    Gluster vol info if needed:
                    Volume Name: zone2-ssd1-vmstor1
                    Type: Replicate
                    Volume ID: b6319968-690b-4060-8fff-b212d2295208
                    Status: Started
                    Snapshot Count: 0
                    Number of Bricks: 1 x 2 = 2
                    Transport-type: rdma
                    Bricks:
                    Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
                    Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
                    Options Reconfigured:
                    cluster.shd-wait-qlength: 10000
                    cluster.shd-max-threads: 8
                    cluster.locking-scheme: granular
                    performance.low-prio-threads: 32
                    cluster.data-self-heal-algorithm: full
                    performance.client-io-threads: off
                    storage.linux-aio: off
                    performance.readdir-ahead: on
                    client.event-threads: 16
                    server.event-threads: 16
                    performance.strict-write-ordering: off
                    performance.quick-read: off
                    performance.read-ahead: on
                    performance.io-cache: off
                    performance.stat-prefetch: off
                    cluster.eager-lock: enable
                    network.remote-dio: on
                    cluster.quorum-type: none
                    network.ping-timeout: 22
                    performance.write-behind: off
                    nfs.disable: on
                    features.shard: on
                    features.shard-block-size: 512MB
                    storage.owner-uid: 36
                    storage.owner-gid: 36
                    performance.io-thread-count: 64
                    performance.cache-size: 2048MB
                    performance.write-behind-window-size: 256MB
                    server.allow-insecure: on
                    cluster.ensure-durability: off
                    config.transport: rdma
                    server.outstanding-rpc-limit: 512
                    diagnostics.brick-log-level: INFO

                    Any recommendations how to advance from here?

                    Best regards,
                    Samuli Heinonen

                    _______________________________________________
                    Gluster-users mailing list
                    Gluster-users@xxxxxxxxxxx
                    <mailto:Gluster-users@gluster.org>
                    http://lists.gluster.org/mailman/listinfo/gluster-users

                    <http://lists.gluster.org/mailman/listinfo/gluster-users>
                    [3]


            --

            Pranith


            Links:
            ------
            [1] http://ovirt8z2.xxx.com
            [2] http://review.gluster.org/14816
            <http://review.gluster.org/14816>
            [3] http://lists.gluster.org/mailman/listinfo/gluster-users
            <http://lists.gluster.org/mailman/listinfo/gluster-users>




    --
    Pranith


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux