Re: Upgrading from Gluster 3.8 to 3.12

Atin Mukherjee <amukherj@xxxxxxxxxx> · Wed, 20 Dec 2017 11:28:10 +0530

Looks like a bug as I see tier-enabled = 0 is an additional entry in the info file in shchhv01. As per the code, this field should be written into the glusterd store if the op-version is >= 30706 . What I am guessing is since we didn't have the commit 33f8703a1 "glusterd: regenerate volfiles on op-version bump up" in 3.8.4 while bumping up the op-version the info and volfiles were not regenerated which caused the tier-enabled entry to be missing in the info file.

For now, you can copy the info file for the volumes where the mismatch happened from shchhv01 to shchhv02 and restart glusterd service on shchhv02. That should fix up this temporarily. Unfortunately this step might need to be repeated for other nodes as well.

@Hari - Could you help in debugging this further.

On Wed, Dec 20, 2017 at 10:44 AM, Gustave Dahl <gustave@xxxxxxxxxxxxxx> wrote:
I was attempting the same on a local sandbox and also have the same problem.

Current: 3.8.4

Volume Name: shchst01

Type: Distributed-Replicate

Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3

Status: Started

Snapshot Count: 0

Number of Bricks: 4 x 3 = 12

Transport-type: tcp

Bricks:

Brick1: shchhv01-sto:/data/brick3/shchst01

Brick2: shchhv02-sto:/data/brick3/shchst01

Brick3: shchhv03-sto:/data/brick3/shchst01

Brick4: shchhv01-sto:/data/brick1/shchst01

Brick5: shchhv02-sto:/data/brick1/shchst01

Brick6: shchhv03-sto:/data/brick1/shchst01

Brick7: shchhv02-sto:/data/brick2/shchst01

Brick8: shchhv03-sto:/data/brick2/shchst01

Brick9: shchhv04-sto:/data/brick2/shchst01

Brick10: shchhv02-sto:/data/brick4/shchst01

Brick11: shchhv03-sto:/data/brick4/shchst01

Brick12: shchhv04-sto:/data/brick4/shchst01

Options Reconfigured:

cluster.data-self-heal-algorithm: full

features.shard-block-size: 512MB

features.shard: enable

performance.readdir-ahead: on

storage.owner-uid: 9869

storage.owner-gid: 9869

server.allow-insecure: on

performance.quick-read: off

performance.read-ahead: off

performance.io-cache: off

performance.stat-prefetch: off

cluster.eager-lock: enable

network.remote-dio: enable

cluster.quorum-type: auto

cluster.server-quorum-type: server

cluster.self-heal-daemon: on

nfs.disable: on

performance.io-thread-count: 64

performance.cache-size: 1GB

Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4

RESULT

=====================

Hostname: shchhv01-sto

Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816

State: Peer Rejected (Connected)

Upgraded Server:  shchhv01-sto

==============================

[2017-12-20 05:02:44.747313] I [MSGID: 101190]

[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with

index 1

[2017-12-20 05:02:44.747387] I [MSGID: 101190]

[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with

index 2

[2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]

0-management: RPC_CLNT_PING notify failed

[2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]

0-management: RPC_CLNT_PING notify failed

[2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]

0-management: RPC_CLNT_PING notify failed

[2017-12-20 05:02:54.676324] I [MSGID: 106493]

[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT

from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port: 0

[2017-12-20 05:02:54.690237] I [MSGID: 106163]

[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:

using the op-version 30800

[2017-12-20 05:02:54.695823] I [MSGID: 106490]

[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:

Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272

[2017-12-20 05:02:54.696956] E [MSGID: 106010]

[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version

of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =

2747317484 on peer shchhv02-sto

[2017-12-20 05:02:54.697796] I [MSGID: 106493]

[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:

Responded to shchhv02-sto (0), ret: 0, op_ret: -1

[2017-12-20 05:02:55.033822] I [MSGID: 106493]

[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT

from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port: 0

[2017-12-20 05:02:55.038460] I [MSGID: 106163]

[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:

using the op-version 30800

[2017-12-20 05:02:55.040032] I [MSGID: 106490]

[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:

Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b

[2017-12-20 05:02:55.040266] E [MSGID: 106010]

[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version

of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =

2747317484 on peer shchhv03-sto

[2017-12-20 05:02:55.040405] I [MSGID: 106493]

[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:

Responded to shchhv03-sto (0), ret: 0, op_ret: -1

[2017-12-20 05:02:55.584854] I [MSGID: 106493]

[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT

from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto, port: 0

[2017-12-20 05:02:55.595125] I [MSGID: 106163]

[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:

using the op-version 30800

[2017-12-20 05:02:55.600804] I [MSGID: 106490]

[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:

Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5

[2017-12-20 05:02:55.601288] E [MSGID: 106010]

[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version

of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =

2747317484 on peer shchhv04-sto

[2017-12-20 05:02:55.601497] I [MSGID: 106493]

[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:

Responded to shchhv04-sto (0), ret: 0, op_ret: -1

Another Server:  shchhv02-sto

==============================

[2017-12-20 05:02:44.667833] W

[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]

(-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)

[0x7f75fdc12e5c]

-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)

[0x7f75fdc1ca08]

-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)

[0x7f75fdcc57fa] ) 0-management: Lock for vol shchst01-sto not held

[2017-12-20 05:02:44.667795] I [MSGID: 106004]

[glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer

<shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer

Rejected>, has disconnected from glusterd.

[2017-12-20 05:02:44.667948] W [MSGID: 106118]

[glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not

released for shchst01-sto

[2017-12-20 05:02:44.760103] I [MSGID: 106163]

[glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management:

using the op-version 30800

[2017-12-20 05:02:44.765389] I [MSGID: 106490]

[glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd:

Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816

[2017-12-20 05:02:54.686185] E [MSGID: 106010]

[glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version

of Cksums shchst01 differ. local cksum = 2747317484, remote cksum =

4218452135 on peer shchhv01-sto

[2017-12-20 05:02:54.686882] I [MSGID: 106493]

[glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:

Responded to shchhv01-sto (0), ret: 0, op_ret: -1

[2017-12-20 05:02:54.717854] I [MSGID: 106493]

[glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT

from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0

Another Server:  shchhv04-sto

==============================

[2017-12-20 05:02:44.667620] I [MSGID: 106004]

[glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer

<shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer

Rejected>, has disconnected from glusterd.

[2017-12-20 05:02:44.667808] W

[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]

(-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)

[0x7f10a33d9e5c]

-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)

[0x7f10a33e3a08]

-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)

[0x7f10a348c7fa] ) 0-management: Lock for vol shchst01-sto not held

[2017-12-20 05:02:44.667827] W [MSGID: 106118]

[glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not

released for shchst01-sto

[2017-12-20 05:02:44.760077] I [MSGID: 106163]

[glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management:

using the op-version 30800

[2017-12-20 05:02:44.768796] I [MSGID: 106490]

[glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd:

Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816

[2017-12-20 05:02:55.595095] E [MSGID: 106010]

[glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version

of Cksums shchst01-sto differ. local cksum = 2747317484, remote cksum =

4218452135 on peer shchhv01-sto

[2017-12-20 05:02:55.595273] I [MSGID: 106493]

[glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:

Responded to shchhv01-sto (0), ret: 0, op_ret: -1

[2017-12-20 05:02:55.612957] I [MSGID: 106493]

[glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT

from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0

<vol>/info

Upgraded Server: shchst01-sto

=========================

type=2

count=12

status=1

sub_count=3

stripe_count=1

replica_count=3

disperse_count=0

redundancy_count=0

version=52

transport-type=0

volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3

username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc

password=58652573-0955-4d00-893a-9f42d0f16717

op-version=30700

client-op-version=30700

quota-version=0

tier-enabled=0

parent_volname=N/A

restored_from_snap=00000000-0000-0000-0000-000000000000

snap-max-hard-limit=256

cluster.data-self-heal-algorithm=full

features.shard-block-size=512MB

features.shard=enable

nfs.disable=on

cluster.self-heal-daemon=on

cluster.server-quorum-type=server

cluster.quorum-type=auto

network.remote-dio=enable

cluster.eager-lock=enable

performance.stat-prefetch=off

performance.io-cache=off

performance.read-ahead=off

performance.quick-read=off

server.allow-insecure=on

storage.owner-gid=9869

storage.owner-uid=9869

performance.readdir-ahead=on

performance.io-thread-count=64

performance.cache-size=1GB

brick-0=shchhv01-sto:-data-brick3-shchst01

brick-1=shchhv02-sto:-data-brick3-shchst01

brick-2=shchhv03-sto:-data-brick3-shchst01

brick-3=shchhv01-sto:-data-brick1-shchst01

brick-4=shchhv02-sto:-data-brick1-shchst01

brick-5=shchhv03-sto:-data-brick1-shchst01

brick-6=shchhv02-sto:-data-brick2-shchst01

brick-7=shchhv03-sto:-data-brick2-shchst01

brick-8=shchhv04-sto:-data-brick2-shchst01

brick-9=shchhv02-sto:-data-brick4-shchst01

brick-10=shchhv03-sto:-data-brick4-shchst01

brick-11=shchhv04-sto:-data-brick4-shchst01

Another Server:  shchhv02-sto

==============================

type=2

count=12

status=1

sub_count=3

stripe_count=1

replica_count=3

disperse_count=0

redundancy_count=0

version=52

transport-type=0

volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3

username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc

password=58652573-0955-4d00-893a-9f42d0f16717

op-version=30700

client-op-version=30700

quota-version=0

parent_volname=N/A

restored_from_snap=00000000-0000-0000-0000-000000000000

snap-max-hard-limit=256

cluster.data-self-heal-algorithm=full

features.shard-block-size=512MB

features.shard=enable

performance.readdir-ahead=on

storage.owner-uid=9869

storage.owner-gid=9869

server.allow-insecure=on

performance.quick-read=off

performance.read-ahead=off

performance.io-cache=off

performance.stat-prefetch=off

cluster.eager-lock=enable

network.remote-dio=enable

cluster.quorum-type=auto

cluster.server-quorum-type=server

cluster.self-heal-daemon=on

nfs.disable=on

performance.io-thread-count=64

performance.cache-size=1GB

brick-0=shchhv01-sto:-data-brick3-shchst01

brick-1=shchhv02-sto:-data-brick3-shchst01

brick-2=shchhv03-sto:-data-brick3-shchst01

brick-3=shchhv01-sto:-data-brick1-shchst01

brick-4=shchhv02-sto:-data-brick1-shchst01

brick-5=shchhv03-sto:-data-brick1-shchst01

brick-6=shchhv02-sto:-data-brick2-shchst01

brick-7=shchhv03-sto:-data-brick2-shchst01

brick-8=shchhv04-sto:-data-brick2-shchst01

brick-9=shchhv02-sto:-data-brick4-shchst01

brick-10=shchhv03-sto:-data-brick4-shchst01

brick-11=shchhv04-sto:-data-brick4-shchst01

NOTE

[root@shchhv01 shchst01]# gluster volume get shchst01 cluster.op-version

Warning: Support to get global option value using `volume get <volname>`

will be deprecated from next release. Consider using `volume get all`

instead for global options

Option                                  Value

------                                  -----

cluster.op-version                      30800

[root@shchhv02 shchst01]# gluster volume get shchst01 cluster.op-version

Option                                  Value

------                                  -----

cluster.op-version                      30800

-----Original Message-----

From: gluster-users-bounces@gluster.org

[mailto:gluster-users-bounces@gluster.org] On Behalf Of Ziemowit Pierzycki

Sent: Tuesday, December 19, 2017 3:56 PM

To: gluster-users <gluster-users@xxxxxxxxxxx>

Subject: Re:  Upgrading from Gluster 3.8 to 3.12

I have not done the upgrade yet.  Since this is a production cluster I need

to make sure it stays up or schedule some downtime if it doesn't doesn't.

Thanks.

On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj@xxxxxxxxxx>

wrote:

>

>

> On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki

> <ziemowit@xxxxxxxxxxxxx>

> wrote:

>>

>> Hi,

>>

>> I have a cluster of 10 servers all running Fedora 24 along with

>> Gluster 3.8.  I'm planning on doing rolling upgrades to Fedora 27

>> with Gluster 3.12.  I saw the documentation and did some testing but

>> I would like to run my plan through some (more?) educated minds.

>>

>> The current setup is:

>>

>> Volume Name: vol0

>> Distributed-Replicate

>> Number of Bricks: 2 x (2 + 1) = 6

>> Bricks:

>> Brick1: glt01:/vol/vol0

>> Brick2: glt02:/vol/vol0

>> Brick3: glt05:/vol/vol0 (arbiter)

>> Brick4: glt03:/vol/vol0

>> Brick5: glt04:/vol/vol0

>> Brick6: glt06:/vol/vol0 (arbiter)

>>

>> Volume Name: vol1

>> Distributed-Replicate

>> Number of Bricks: 2 x (2 + 1) = 6

>> Bricks:

>> Brick1: glt07:/vol/vol1

>> Brick2: glt08:/vol/vol1

>> Brick3: glt05:/vol/vol1 (arbiter)

>> Brick4: glt09:/vol/vol1

>> Brick5: glt10:/vol/vol1

>> Brick6: glt06:/vol/vol1 (arbiter)

>>

>> After performing the upgrade because of differences in checksums, the

>> upgraded nodes will become:

>>

>> State: Peer Rejected (Connected)

>

>

> Have you upgraded all the nodes? If yes, have you bumped up the

> cluster.op-version after upgrading all the nodes? Please follow :

> http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more

> details on how to bump up the cluster.op-version. In case you have

> done all of these and you're seeing a checksum issue then I'm afraid

> you have hit a bug. I'd need further details like the checksum

> mismatch error from glusterd.log file along with the the exact

> volume's info file from /var/lib/glusterd/vols/<volname>/info between

> both the peers to debug this further.

>

>>

>> If I start doing the upgrades one at a time, with nodes glt10 to

>> glt01 except for the arbiters glt05 and glt06, and then upgrading the

>> arbiters last, everything should remain online at all times through

>> the process.  Correct?

>>

>> Thanks.

>> _______________________________________________

>> Gluster-users mailing list

>> Gluster-users@xxxxxxxxxxx

>> http://lists.gluster.org/mailman/listinfo/gluster-users

>

>

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users