Re: Upgrading from Gluster 3.8 to 3.12

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Looks like a bug as I see tier-enabled = 0 is an additional entry in the info file in shchhv01. As per the code, this field should be written into the glusterd store if the op-version is >= 30706 . What I am guessing is since we didn't have the commit 33f8703a1 "glusterd: regenerate volfiles on op-version bump up" in 3.8.4 while bumping up the op-version the info and volfiles were not regenerated which caused the tier-enabled entry to be missing in the info file.

For now, you can copy the info file for the volumes where the mismatch happened from shchhv01 to shchhv02 and restart glusterd service on shchhv02. That should fix up this temporarily. Unfortunately this step might need to be repeated for other nodes as well.

@Hari - Could you help in debugging this further.



On Wed, Dec 20, 2017 at 10:44 AM, Gustave Dahl <gustave@xxxxxxxxxxxxxx> wrote:
I was attempting the same on a local sandbox and also have the same problem.


Current: 3.8.4

Volume Name: shchst01
Type: Distributed-Replicate
Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: shchhv01-sto:/data/brick3/shchst01
Brick2: shchhv02-sto:/data/brick3/shchst01
Brick3: shchhv03-sto:/data/brick3/shchst01
Brick4: shchhv01-sto:/data/brick1/shchst01
Brick5: shchhv02-sto:/data/brick1/shchst01
Brick6: shchhv03-sto:/data/brick1/shchst01
Brick7: shchhv02-sto:/data/brick2/shchst01
Brick8: shchhv03-sto:/data/brick2/shchst01
Brick9: shchhv04-sto:/data/brick2/shchst01
Brick10: shchhv02-sto:/data/brick4/shchst01
Brick11: shchhv03-sto:/data/brick4/shchst01
Brick12: shchhv04-sto:/data/brick4/shchst01
Options Reconfigured:
cluster.data-self-heal-algorithm: full
features.shard-block-size: 512MB
features.shard: enable
performance.readdir-ahead: on
storage.owner-uid: 9869
storage.owner-gid: 9869
server.allow-insecure: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.self-heal-daemon: on
nfs.disable: on
performance.io-thread-count: 64
performance.cache-size: 1GB

Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4

RESULT
=====================
Hostname: shchhv01-sto
Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
State: Peer Rejected (Connected)

Upgraded Server:  shchhv01-sto
==============================
[2017-12-20 05:02:44.747313] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 1
[2017-12-20 05:02:44.747387] I [MSGID: 101190]
[event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with
index 2
[2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk]
0-management: RPC_CLNT_PING notify failed
[2017-12-20 05:02:54.676324] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port: 0
[2017-12-20 05:02:54.690237] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:54.695823] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272
[2017-12-20 05:02:54.696956] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
2747317484 on peer shchhv02-sto
[2017-12-20 05:02:54.697796] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv02-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.033822] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port: 0
[2017-12-20 05:02:55.038460] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:55.040032] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b
[2017-12-20 05:02:55.040266] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
2747317484 on peer shchhv03-sto
[2017-12-20 05:02:55.040405] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv03-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.584854] I [MSGID: 106493]
[glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto, port: 0
[2017-12-20 05:02:55.595125] I [MSGID: 106163]
[glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:55.600804] I [MSGID: 106490]
[glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5
[2017-12-20 05:02:55.601288] E [MSGID: 106010]
[glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum =
2747317484 on peer shchhv04-sto
[2017-12-20 05:02:55.601497] I [MSGID: 106493]
[glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv04-sto (0), ret: 0, op_ret: -1

Another Server:  shchhv02-sto
==============================
[2017-12-20 05:02:44.667833] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)
[0x7f75fdc12e5c]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)
[0x7f75fdc1ca08]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)
[0x7f75fdcc57fa] ) 0-management: Lock for vol shchst01-sto not held
[2017-12-20 05:02:44.667795] I [MSGID: 106004]
[glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer
<shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer
Rejected>, has disconnected from glusterd.
[2017-12-20 05:02:44.667948] W [MSGID: 106118]
[glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not
released for shchst01-sto
[2017-12-20 05:02:44.760103] I [MSGID: 106163]
[glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:44.765389] I [MSGID: 106490]
[glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
[2017-12-20 05:02:54.686185] E [MSGID: 106010]
[glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01 differ. local cksum = 2747317484, remote cksum =
4218452135 on peer shchhv01-sto
[2017-12-20 05:02:54.686882] I [MSGID: 106493]
[glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv01-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:54.717854] I [MSGID: 106493]
[glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0

Another Server:  shchhv04-sto
==============================
[2017-12-20 05:02:44.667620] I [MSGID: 106004]
[glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer
<shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer
Rejected>, has disconnected from glusterd.
[2017-12-20 05:02:44.667808] W
[glusterd-locks.c:675:glusterd_mgmt_v3_unlock]
(-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c)
[0x7f10a33d9e5c]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08)
[0x7f10a33e3a08]
-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa)
[0x7f10a348c7fa] ) 0-management: Lock for vol shchst01-sto not held
[2017-12-20 05:02:44.667827] W [MSGID: 106118]
[glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock not
released for shchst01-sto
[2017-12-20 05:02:44.760077] I [MSGID: 106163]
[glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] 0-management:
using the op-version 30800
[2017-12-20 05:02:44.768796] I [MSGID: 106490]
[glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] 0-glusterd:
Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816
[2017-12-20 05:02:55.595095] E [MSGID: 106010]
[glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: Version
of Cksums shchst01-sto differ. local cksum = 2747317484, remote cksum =
4218452135 on peer shchhv01-sto
[2017-12-20 05:02:55.595273] I [MSGID: 106493]
[glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd:
Responded to shchhv01-sto (0), ret: 0, op_ret: -1
[2017-12-20 05:02:55.612957] I [MSGID: 106493]
[glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received RJT
from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: 0

<vol>/info

Upgraded Server: shchst01-sto
=========================
type=2
count=12
status=1
sub_count=3
stripe_count=1
replica_count=3
disperse_count=0
redundancy_count=0
version=52
transport-type=0
volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3
username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc
password=58652573-0955-4d00-893a-9f42d0f16717
op-version=30700
client-op-version=30700
quota-version=0
tier-enabled=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
cluster.data-self-heal-algorithm=full
features.shard-block-size=512MB
features.shard=enable
nfs.disable=on
cluster.self-heal-daemon=on
cluster.server-quorum-type=server
cluster.quorum-type=auto
network.remote-dio=enable
cluster.eager-lock=enable
performance.stat-prefetch=off
performance.io-cache=off
performance.read-ahead=off
performance.quick-read=off
server.allow-insecure=on
storage.owner-gid=9869
storage.owner-uid=9869
performance.readdir-ahead=on
performance.io-thread-count=64
performance.cache-size=1GB
brick-0=shchhv01-sto:-data-brick3-shchst01
brick-1=shchhv02-sto:-data-brick3-shchst01
brick-2=shchhv03-sto:-data-brick3-shchst01
brick-3=shchhv01-sto:-data-brick1-shchst01
brick-4=shchhv02-sto:-data-brick1-shchst01
brick-5=shchhv03-sto:-data-brick1-shchst01
brick-6=shchhv02-sto:-data-brick2-shchst01
brick-7=shchhv03-sto:-data-brick2-shchst01
brick-8=shchhv04-sto:-data-brick2-shchst01
brick-9=shchhv02-sto:-data-brick4-shchst01
brick-10=shchhv03-sto:-data-brick4-shchst01
brick-11=shchhv04-sto:-data-brick4-shchst01

Another Server:  shchhv02-sto
==============================
type=2
count=12
status=1
sub_count=3
stripe_count=1
replica_count=3
disperse_count=0
redundancy_count=0
version=52
transport-type=0
volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3
username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc
password=58652573-0955-4d00-893a-9f42d0f16717
op-version=30700
client-op-version=30700
quota-version=0
parent_volname=N/A
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
cluster.data-self-heal-algorithm=full
features.shard-block-size=512MB
features.shard=enable
performance.readdir-ahead=on
storage.owner-uid=9869
storage.owner-gid=9869
server.allow-insecure=on
performance.quick-read=off
performance.read-ahead=off
performance.io-cache=off
performance.stat-prefetch=off
cluster.eager-lock=enable
network.remote-dio=enable
cluster.quorum-type=auto
cluster.server-quorum-type=server
cluster.self-heal-daemon=on
nfs.disable=on
performance.io-thread-count=64
performance.cache-size=1GB
brick-0=shchhv01-sto:-data-brick3-shchst01
brick-1=shchhv02-sto:-data-brick3-shchst01
brick-2=shchhv03-sto:-data-brick3-shchst01
brick-3=shchhv01-sto:-data-brick1-shchst01
brick-4=shchhv02-sto:-data-brick1-shchst01
brick-5=shchhv03-sto:-data-brick1-shchst01
brick-6=shchhv02-sto:-data-brick2-shchst01
brick-7=shchhv03-sto:-data-brick2-shchst01
brick-8=shchhv04-sto:-data-brick2-shchst01
brick-9=shchhv02-sto:-data-brick4-shchst01
brick-10=shchhv03-sto:-data-brick4-shchst01
brick-11=shchhv04-sto:-data-brick4-shchst01

NOTE

[root@shchhv01 shchst01]# gluster volume get shchst01 cluster.op-version
Warning: Support to get global option value using `volume get <volname>`
will be deprecated from next release. Consider using `volume get all`
instead for global options
Option                                  Value

------                                  -----

cluster.op-version                      30800

[root@shchhv02 shchst01]# gluster volume get shchst01 cluster.op-version
Option                                  Value

------                                  -----

cluster.op-version                      30800

-----Original Message-----
From: gluster-users-bounces@gluster.org
[mailto:gluster-users-bounces@gluster.org] On Behalf Of Ziemowit Pierzycki
Sent: Tuesday, December 19, 2017 3:56 PM
To: gluster-users <gluster-users@xxxxxxxxxxx>
Subject: Re: Upgrading from Gluster 3.8 to 3.12

I have not done the upgrade yet.  Since this is a production cluster I need
to make sure it stays up or schedule some downtime if it doesn't doesn't.
Thanks.

On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj@xxxxxxxxxx>
wrote:
>
>
> On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki
> <ziemowit@xxxxxxxxxxxxx>
> wrote:
>>
>> Hi,
>>
>> I have a cluster of 10 servers all running Fedora 24 along with
>> Gluster 3.8.  I'm planning on doing rolling upgrades to Fedora 27
>> with Gluster 3.12.  I saw the documentation and did some testing but
>> I would like to run my plan through some (more?) educated minds.
>>
>> The current setup is:
>>
>> Volume Name: vol0
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt01:/vol/vol0
>> Brick2: glt02:/vol/vol0
>> Brick3: glt05:/vol/vol0 (arbiter)
>> Brick4: glt03:/vol/vol0
>> Brick5: glt04:/vol/vol0
>> Brick6: glt06:/vol/vol0 (arbiter)
>>
>> Volume Name: vol1
>> Distributed-Replicate
>> Number of Bricks: 2 x (2 + 1) = 6
>> Bricks:
>> Brick1: glt07:/vol/vol1
>> Brick2: glt08:/vol/vol1
>> Brick3: glt05:/vol/vol1 (arbiter)
>> Brick4: glt09:/vol/vol1
>> Brick5: glt10:/vol/vol1
>> Brick6: glt06:/vol/vol1 (arbiter)
>>
>> After performing the upgrade because of differences in checksums, the
>> upgraded nodes will become:
>>
>> State: Peer Rejected (Connected)
>
>
> Have you upgraded all the nodes? If yes, have you bumped up the
> cluster.op-version after upgrading all the nodes? Please follow :
> http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more
> details on how to bump up the cluster.op-version. In case you have
> done all of these and you're seeing a checksum issue then I'm afraid
> you have hit a bug. I'd need further details like the checksum
> mismatch error from glusterd.log file along with the the exact
> volume's info file from /var/lib/glusterd/vols/<volname>/info between
> both the peers to debug this further.
>
>>
>> If I start doing the upgrades one at a time, with nodes glt10 to
>> glt01 except for the arbiters glt05 and glt06, and then upgrading the
>> arbiters last, everything should remain online at all times through
>> the process.  Correct?
>>
>> Thanks.
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux