Re: Upgrading from Gluster 3.8 to 3.12

"Gustave Dahl" <gustave@xxxxxxxxxxxxxx> · Tue, 19 Dec 2017 23:14:38 -0600

I was attempting the same on a local sandbox and also have the same problem.

Current: 3.8.4

Volume Name: shchst01
Type: Distributed-Replicate
Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: shchhv01-sto:/data/brick3/shchst01
Brick2: shchhv02-sto:/data/brick3/shchst01
Brick3: shchhv03-sto:/data/brick3/shchst01
Brick4: shchhv01-sto:/data/brick1/shchst01
Brick5: shchhv02-sto:/data/brick1/shchst01
Brick6: shchhv03-sto:/data/brick1/shchst01
Brick7: shchhv02-sto:/data/brick2/shchst01
Brick8: shchhv03-sto:/data/brick2/shchst01
Brick9: shchhv04-sto:/data/brick2/shchst01
Brick10: shchhv02-sto:/data/brick4/shchst01
Brick11: shchhv03-sto:/data/brick4/shchst01
Brick12: shchhv04-sto:/data/brick4/shchst01
Options Reconfigured:
cluster.data-self-heal-algorithm: full
features.shard-block-size: 512MB
features.shard: enable
performance.readdir-ahead: on
storage.owner-uid: 9869
storage.owner-gid: 9869
server.allow-insecure: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.self-heal-daemon: on
nfs.disable: on
performance.io-thread-count: 64
performance.cache-size: 1GB

Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4

RESULT
=====================
Hostname: shchhv01-sto
Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
State: Peer Rejected (Connected)

Upgraded Server:  shchhv01-sto
==============================
[2017-12-20 05:02:44.747313] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2017-12-20 05:02:44.747387] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 2
[2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:54.676324] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port: 0
[2017-12-20 05:02:54.690237] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:54.695823] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272
[2017-12-20 05:02:54.696956] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
2747317484 on peer shchhv02-sto
[2017-12-20 05:02:54.697796] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv02-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.033822] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port: 0
[2017-12-20 05:02:55.038460] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:55.040032] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b
[2017-12-20 05:02:55.040266] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
2747317484 on peer shchhv03-sto
[2017-12-20 05:02:55.040405] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv03-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.584854] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto, port: 0
[2017-12-20 05:02:55.595125] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:55.600804] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5
[2017-12-20 05:02:55.601288] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
2747317484 on peer shchhv04-sto
[2017-12-20 05:02:55.601497] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv04-sto (0), ret: 0, op_ret: -1

Another Server:  shchhv02-sto
==============================
[2017-12-20 05:02:44.667833] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)
[0x7f75fdc12e5c]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)
[0x7f75fdc1ca08]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)
[0x7f75fdcc57fa] ) 0-management: Lock for vol shchst01-sto not held
[2017-12-20 05:02:44.667795] I [MSGID: 106004]
[glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer
<shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer
Rejected>, has disconnected from glusterd.
[2017-12-20 05:02:44.667948] W [MSGID: 106118]
[glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not
released for shchst01-sto
[2017-12-20 05:02:44.760103] I [MSGID: 106163]
[glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:44.765389] I [MSGID: 106490]
[glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
[2017-12-20 05:02:54.686185] E [MSGID: 106010]
[glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01 differ. local cksum = 2747317484, remote cksum =
4218452135 on peer shchhv01-sto
[2017-12-20 05:02:54.686882] I [MSGID: 106493]
[glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv01-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:54.717854] I [MSGID: 106493]
[glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0

Another Server:  shchhv04-sto
==============================
[2017-12-20 05:02:44.667620] I [MSGID: 106004]
[glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer
<shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer
Rejected>, has disconnected from glusterd.
[2017-12-20 05:02:44.667808] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)
[0x7f10a33d9e5c]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)
[0x7f10a33e3a08]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)
[0x7f10a348c7fa] ) 0-management: Lock for vol shchst01-sto not held
[2017-12-20 05:02:44.667827] W [MSGID: 106118]
[glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not
released for shchst01-sto
[2017-12-20 05:02:44.760077] I [MSGID: 106163]
[glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:44.768796] I [MSGID: 106490]
[glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
[2017-12-20 05:02:55.595095] E [MSGID: 106010]
[glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 2747317484, remote cksum =
4218452135 on peer shchhv01-sto
[2017-12-20 05:02:55.595273] I [MSGID: 106493]
[glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv01-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.612957] I [MSGID: 106493]
[glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0

<vol>/info

Upgraded Server: shchst01-sto
=========================
type=2
count=12
status=1
sub_count=3
stripe_count=1
replica_count=3
disperse_count=0
redundancy_count=0
version=52
transport-type=0
volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3
username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc
password=58652573-0955-4d00-893a-9f42d0f16717
op-version=30700
client-op-version=30700
quota-version=0
tier-enabled=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
cluster.data-self-heal-algorithm=full
features.shard-block-size=512MB
features.shard=enable
nfs.disable=on
cluster.self-heal-daemon=on
cluster.server-quorum-type=server
cluster.quorum-type=auto
network.remote-dio=enable
cluster.eager-lock=enable
performance.stat-prefetch=off
performance.io-cache=off
performance.read-ahead=off
performance.quick-read=off
server.allow-insecure=on
storage.owner-gid=9869
storage.owner-uid=9869
performance.readdir-ahead=on
performance.io-thread-count=64
performance.cache-size=1GB
brick-0=shchhv01-sto:-data-brick3-shchst01
brick-1=shchhv02-sto:-data-brick3-shchst01
brick-2=shchhv03-sto:-data-brick3-shchst01
brick-3=shchhv01-sto:-data-brick1-shchst01
brick-4=shchhv02-sto:-data-brick1-shchst01
brick-5=shchhv03-sto:-data-brick1-shchst01
brick-6=shchhv02-sto:-data-brick2-shchst01
brick-7=shchhv03-sto:-data-brick2-shchst01
brick-8=shchhv04-sto:-data-brick2-shchst01
brick-9=shchhv02-sto:-data-brick4-shchst01
brick-10=shchhv03-sto:-data-brick4-shchst01
brick-11=shchhv04-sto:-data-brick4-shchst01

Another Server:  shchhv02-sto
==============================
type=2
count=12
status=1
sub_count=3
stripe_count=1
replica_count=3
disperse_count=0
redundancy_count=0
version=52
transport-type=0
volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3
username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc
password=58652573-0955-4d00-893a-9f42d0f16717
op-version=30700
client-op-version=30700
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
cluster.data-self-heal-algorithm=full
features.shard-block-size=512MB
features.shard=enable
performance.readdir-ahead=on
storage.owner-uid=9869
storage.owner-gid=9869
server.allow-insecure=on
performance.quick-read=off
performance.read-ahead=off
performance.io-cache=off
performance.stat-prefetch=off
cluster.eager-lock=enable
network.remote-dio=enable
cluster.quorum-type=auto
cluster.server-quorum-type=server
cluster.self-heal-daemon=on
nfs.disable=on
performance.io-thread-count=64
performance.cache-size=1GB
brick-0=shchhv01-sto:-data-brick3-shchst01
brick-1=shchhv02-sto:-data-brick3-shchst01
brick-2=shchhv03-sto:-data-brick3-shchst01
brick-3=shchhv01-sto:-data-brick1-shchst01
brick-4=shchhv02-sto:-data-brick1-shchst01
brick-5=shchhv03-sto:-data-brick1-shchst01
brick-6=shchhv02-sto:-data-brick2-shchst01
brick-7=shchhv03-sto:-data-brick2-shchst01
brick-8=shchhv04-sto:-data-brick2-shchst01
brick-9=shchhv02-sto:-data-brick4-shchst01
brick-10=shchhv03-sto:-data-brick4-shchst01
brick-11=shchhv04-sto:-data-brick4-shchst01

NOTE

[root@shchhv01 shchst01]# gluster volume get shchst01 cluster.op-version
Warning: Support to get global option value using `volume get <volname>`
will be deprecated from next release. Consider using `volume get all`
instead for global options
Option                                  Value

------                                  -----

cluster.op-version                      30800  

[root@shchhv02 shchst01]# gluster volume get shchst01 cluster.op-version
Option                                  Value

------                                  -----

cluster.op-version                      30800   

-----Original Message-----
From: gluster-users-bounces@xxxxxxxxxxx
[mailto:gluster-users-bounces@xxxxxxxxxxx] On Behalf Of Ziemowit Pierzycki
Sent: Tuesday, December 19, 2017 3:56 PM
To: gluster-users <gluster-users@xxxxxxxxxxx>
Subject: Re:  Upgrading from Gluster 3.8 to 3.12

I have not done the upgrade yet.  Since this is a production cluster I need
to make sure it stays up or schedule some downtime if it doesn't doesn't.
Thanks.

On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj@xxxxxxxxxx>
wrote:
>
>
> On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki 
> <ziemowit@xxxxxxxxxxxxx>
> wrote:
>>
>> Hi,
>>
>> I have a cluster of 10 servers all running Fedora 24 along with 
>> Gluster 3.8.  I'm planning on doing rolling upgrades to Fedora 27 
>> with Gluster 3.12.  I saw the documentation and did some testing but 
>> I would like to run my plan through some (more?) educated minds.
>>
>> The current setup is:
>>
>> Volume Name: vol0
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt01:/vol/vol0
>> Brick2: glt02:/vol/vol0
>> Brick3: glt05:/vol/vol0 (arbiter)
>> Brick4: glt03:/vol/vol0
>> Brick5: glt04:/vol/vol0
>> Brick6: glt06:/vol/vol0 (arbiter)
>>
>> Volume Name: vol1
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt07:/vol/vol1
>> Brick2: glt08:/vol/vol1
>> Brick3: glt05:/vol/vol1 (arbiter)
>> Brick4: glt09:/vol/vol1
>> Brick5: glt10:/vol/vol1
>> Brick6: glt06:/vol/vol1 (arbiter)
>>
>> After performing the upgrade because of differences in checksums, the 
>> upgraded nodes will become:
>>
>> State: Peer Rejected (Connected)
>
>
> Have you upgraded all the nodes? If yes, have you bumped up the 
> cluster.op-version after upgrading all the nodes? Please follow :
> http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more 
> details on how to bump up the cluster.op-version. In case you have 
> done all of these and you're seeing a checksum issue then I'm afraid 
> you have hit a bug. I'd need further details like the checksum 
> mismatch error from glusterd.log file along with the the exact 
> volume's info file from /var/lib/glusterd/vols/<volname>/info between 
> both the peers to debug this further.
>
>>
>> If I start doing the upgrades one at a time, with nodes glt10 to 
>> glt01 except for the arbiters glt05 and glt06, and then upgrading the 
>> arbiters last, everything should remain online at all times through 
>> the process.  Correct?
>>
>> Thanks.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users