Yes Atin. I'll take a look. On Wed, Dec 20, 2017 at 11:28 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote: > Looks like a bug as I see tier-enabled = 0 is an additional entry in the > info file in shchhv01. As per the code, this field should be written into > the glusterd store if the op-version is >= 30706 . What I am guessing is > since we didn't have the commit 33f8703a1 "glusterd: regenerate volfiles on > op-version bump up" in 3.8.4 while bumping up the op-version the info and > volfiles were not regenerated which caused the tier-enabled entry to be > missing in the info file. > > For now, you can copy the info file for the volumes where the mismatch > happened from shchhv01 to shchhv02 and restart glusterd service on shchhv02. > That should fix up this temporarily. Unfortunately this step might need to > be repeated for other nodes as well. > > @Hari - Could you help in debugging this further. > > > > On Wed, Dec 20, 2017 at 10:44 AM, Gustave Dahl <gustave@xxxxxxxxxxxxxx> > wrote: >> >> I was attempting the same on a local sandbox and also have the same >> problem. >> >> >> Current: 3.8.4 >> >> Volume Name: shchst01 >> Type: Distributed-Replicate >> Volume ID: bcd53e52-cde6-4e58-85f9-71d230b7b0d3 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 4 x 3 = 12 >> Transport-type: tcp >> Bricks: >> Brick1: shchhv01-sto:/data/brick3/shchst01 >> Brick2: shchhv02-sto:/data/brick3/shchst01 >> Brick3: shchhv03-sto:/data/brick3/shchst01 >> Brick4: shchhv01-sto:/data/brick1/shchst01 >> Brick5: shchhv02-sto:/data/brick1/shchst01 >> Brick6: shchhv03-sto:/data/brick1/shchst01 >> Brick7: shchhv02-sto:/data/brick2/shchst01 >> Brick8: shchhv03-sto:/data/brick2/shchst01 >> Brick9: shchhv04-sto:/data/brick2/shchst01 >> Brick10: shchhv02-sto:/data/brick4/shchst01 >> Brick11: shchhv03-sto:/data/brick4/shchst01 >> Brick12: shchhv04-sto:/data/brick4/shchst01 >> Options Reconfigured: >> cluster.data-self-heal-algorithm: full >> features.shard-block-size: 512MB >> features.shard: enable >> performance.readdir-ahead: on >> storage.owner-uid: 9869 >> storage.owner-gid: 9869 >> server.allow-insecure: on >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.stat-prefetch: off >> cluster.eager-lock: enable >> network.remote-dio: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> cluster.self-heal-daemon: on >> nfs.disable: on >> performance.io-thread-count: 64 >> performance.cache-size: 1GB >> >> Upgraded shchhv01-sto to 3.12.3, others remain at 3.8.4 >> >> RESULT >> ===================== >> Hostname: shchhv01-sto >> Uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 >> State: Peer Rejected (Connected) >> >> Upgraded Server: shchhv01-sto >> ============================== >> [2017-12-20 05:02:44.747313] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread >> with >> index 1 >> [2017-12-20 05:02:44.747387] I [MSGID: 101190] >> [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread >> with >> index 2 >> [2017-12-20 05:02:44.749087] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] >> 0-management: RPC_CLNT_PING notify failed >> [2017-12-20 05:02:44.749165] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] >> 0-management: RPC_CLNT_PING notify failed >> [2017-12-20 05:02:44.749563] W [rpc-clnt-ping.c:246:rpc_clnt_ping_cbk] >> 0-management: RPC_CLNT_PING notify failed >> [2017-12-20 05:02:54.676324] I [MSGID: 106493] >> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received >> RJT >> from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272, host: shchhv02-sto, port: >> 0 >> [2017-12-20 05:02:54.690237] I [MSGID: 106163] >> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] >> 0-management: >> using the op-version 30800 >> [2017-12-20 05:02:54.695823] I [MSGID: 106490] >> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] >> 0-glusterd: >> Received probe from uuid: 546503ae-ba0e-40d4-843f-c5dbac22d272 >> [2017-12-20 05:02:54.696956] E [MSGID: 106010] >> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: >> Version >> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum = >> 2747317484 on peer shchhv02-sto >> [2017-12-20 05:02:54.697796] I [MSGID: 106493] >> [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: >> Responded to shchhv02-sto (0), ret: 0, op_ret: -1 >> [2017-12-20 05:02:55.033822] I [MSGID: 106493] >> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received >> RJT >> from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b, host: shchhv03-sto, port: >> 0 >> [2017-12-20 05:02:55.038460] I [MSGID: 106163] >> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] >> 0-management: >> using the op-version 30800 >> [2017-12-20 05:02:55.040032] I [MSGID: 106490] >> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] >> 0-glusterd: >> Received probe from uuid: 3de22cb5-c1c1-4041-a1e1-eb969afa9b4b >> [2017-12-20 05:02:55.040266] E [MSGID: 106010] >> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: >> Version >> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum = >> 2747317484 on peer shchhv03-sto >> [2017-12-20 05:02:55.040405] I [MSGID: 106493] >> [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: >> Responded to shchhv03-sto (0), ret: 0, op_ret: -1 >> [2017-12-20 05:02:55.584854] I [MSGID: 106493] >> [glusterd-rpc-ops.c:486:__glusterd_friend_add_cbk] 0-glusterd: Received >> RJT >> from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5, host: shchhv04-sto, port: >> 0 >> [2017-12-20 05:02:55.595125] I [MSGID: 106163] >> [glusterd-handshake.c:1316:__glusterd_mgmt_hndsk_versions_ack] >> 0-management: >> using the op-version 30800 >> [2017-12-20 05:02:55.600804] I [MSGID: 106490] >> [glusterd-handler.c:2540:__glusterd_handle_incoming_friend_req] >> 0-glusterd: >> Received probe from uuid: 36306e37-d7f0-4fec-9140-0d0f1bd2d2d5 >> [2017-12-20 05:02:55.601288] E [MSGID: 106010] >> [glusterd-utils.c:3370:glusterd_compare_friend_volume] 0-management: >> Version >> of Cksums shchst01-sto differ. local cksum = 4218452135, remote cksum = >> 2747317484 on peer shchhv04-sto >> [2017-12-20 05:02:55.601497] I [MSGID: 106493] >> [glusterd-handler.c:3800:glusterd_xfer_friend_add_resp] 0-glusterd: >> Responded to shchhv04-sto (0), ret: 0, op_ret: -1 >> >> Another Server: shchhv02-sto >> ============================== >> [2017-12-20 05:02:44.667833] W >> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] >> (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c) >> [0x7f75fdc12e5c] >> -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08) >> [0x7f75fdc1ca08] >> -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa) >> [0x7f75fdcc57fa] ) 0-management: Lock for vol shchst01-sto not held >> [2017-12-20 05:02:44.667795] I [MSGID: 106004] >> [glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer >> <shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer >> Rejected>, has disconnected from glusterd. >> [2017-12-20 05:02:44.667948] W [MSGID: 106118] >> [glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock >> not >> released for shchst01-sto >> [2017-12-20 05:02:44.760103] I [MSGID: 106163] >> [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] >> 0-management: >> using the op-version 30800 >> [2017-12-20 05:02:44.765389] I [MSGID: 106490] >> [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] >> 0-glusterd: >> Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 >> [2017-12-20 05:02:54.686185] E [MSGID: 106010] >> [glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: >> Version >> of Cksums shchst01 differ. local cksum = 2747317484, remote cksum = >> 4218452135 on peer shchhv01-sto >> [2017-12-20 05:02:54.686882] I [MSGID: 106493] >> [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd: >> Responded to shchhv01-sto (0), ret: 0, op_ret: -1 >> [2017-12-20 05:02:54.717854] I [MSGID: 106493] >> [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received >> RJT >> from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: >> 0 >> >> Another Server: shchhv04-sto >> ============================== >> [2017-12-20 05:02:44.667620] I [MSGID: 106004] >> [glusterd-handler.c:5219:__glusterd_peer_rpc_notify] 0-management: Peer >> <shchhv01-sto> (<f6205edb-a0ea-4247-9594-c4cdc0d05816>), in state <Peer >> Rejected>, has disconnected from glusterd. >> [2017-12-20 05:02:44.667808] W >> [glusterd-locks.c:675:glusterd_mgmt_v3_unlock] >> (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x1de5c) >> [0x7f10a33d9e5c] >> -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x27a08) >> [0x7f10a33e3a08] >> -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xd07fa) >> [0x7f10a348c7fa] ) 0-management: Lock for vol shchst01-sto not held >> [2017-12-20 05:02:44.667827] W [MSGID: 106118] >> [glusterd-handler.c:5241:__glusterd_peer_rpc_notify] 0-management: Lock >> not >> released for shchst01-sto >> [2017-12-20 05:02:44.760077] I [MSGID: 106163] >> [glusterd-handshake.c:1271:__glusterd_mgmt_hndsk_versions_ack] >> 0-management: >> using the op-version 30800 >> [2017-12-20 05:02:44.768796] I [MSGID: 106490] >> [glusterd-handler.c:2608:__glusterd_handle_incoming_friend_req] >> 0-glusterd: >> Received probe from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816 >> [2017-12-20 05:02:55.595095] E [MSGID: 106010] >> [glusterd-utils.c:2930:glusterd_compare_friend_volume] 0-management: >> Version >> of Cksums shchst01-sto differ. local cksum = 2747317484, remote cksum = >> 4218452135 on peer shchhv01-sto >> [2017-12-20 05:02:55.595273] I [MSGID: 106493] >> [glusterd-handler.c:3852:glusterd_xfer_friend_add_resp] 0-glusterd: >> Responded to shchhv01-sto (0), ret: 0, op_ret: -1 >> [2017-12-20 05:02:55.612957] I [MSGID: 106493] >> [glusterd-rpc-ops.c:476:__glusterd_friend_add_cbk] 0-glusterd: Received >> RJT >> from uuid: f6205edb-a0ea-4247-9594-c4cdc0d05816, host: shchhv01-sto, port: >> 0 >> >> <vol>/info >> >> Upgraded Server: shchst01-sto >> ========================= >> type=2 >> count=12 >> status=1 >> sub_count=3 >> stripe_count=1 >> replica_count=3 >> disperse_count=0 >> redundancy_count=0 >> version=52 >> transport-type=0 >> volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3 >> username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc >> password=58652573-0955-4d00-893a-9f42d0f16717 >> op-version=30700 >> client-op-version=30700 >> quota-version=0 >> tier-enabled=0 >> parent_volname=N/A >> restored_from_snap=00000000-0000-0000-0000-000000000000 >> snap-max-hard-limit=256 >> cluster.data-self-heal-algorithm=full >> features.shard-block-size=512MB >> features.shard=enable >> nfs.disable=on >> cluster.self-heal-daemon=on >> cluster.server-quorum-type=server >> cluster.quorum-type=auto >> network.remote-dio=enable >> cluster.eager-lock=enable >> performance.stat-prefetch=off >> performance.io-cache=off >> performance.read-ahead=off >> performance.quick-read=off >> server.allow-insecure=on >> storage.owner-gid=9869 >> storage.owner-uid=9869 >> performance.readdir-ahead=on >> performance.io-thread-count=64 >> performance.cache-size=1GB >> brick-0=shchhv01-sto:-data-brick3-shchst01 >> brick-1=shchhv02-sto:-data-brick3-shchst01 >> brick-2=shchhv03-sto:-data-brick3-shchst01 >> brick-3=shchhv01-sto:-data-brick1-shchst01 >> brick-4=shchhv02-sto:-data-brick1-shchst01 >> brick-5=shchhv03-sto:-data-brick1-shchst01 >> brick-6=shchhv02-sto:-data-brick2-shchst01 >> brick-7=shchhv03-sto:-data-brick2-shchst01 >> brick-8=shchhv04-sto:-data-brick2-shchst01 >> brick-9=shchhv02-sto:-data-brick4-shchst01 >> brick-10=shchhv03-sto:-data-brick4-shchst01 >> brick-11=shchhv04-sto:-data-brick4-shchst01 >> >> Another Server: shchhv02-sto >> ============================== >> type=2 >> count=12 >> status=1 >> sub_count=3 >> stripe_count=1 >> replica_count=3 >> disperse_count=0 >> redundancy_count=0 >> version=52 >> transport-type=0 >> volume-id=bcd53e52-cde6-4e58-85f9-71d230b7b0d3 >> username=5a4ae8d8-dbcb-408e-ab73-629255c14ffc >> password=58652573-0955-4d00-893a-9f42d0f16717 >> op-version=30700 >> client-op-version=30700 >> quota-version=0 >> parent_volname=N/A >> restored_from_snap=00000000-0000-0000-0000-000000000000 >> snap-max-hard-limit=256 >> cluster.data-self-heal-algorithm=full >> features.shard-block-size=512MB >> features.shard=enable >> performance.readdir-ahead=on >> storage.owner-uid=9869 >> storage.owner-gid=9869 >> server.allow-insecure=on >> performance.quick-read=off >> performance.read-ahead=off >> performance.io-cache=off >> performance.stat-prefetch=off >> cluster.eager-lock=enable >> network.remote-dio=enable >> cluster.quorum-type=auto >> cluster.server-quorum-type=server >> cluster.self-heal-daemon=on >> nfs.disable=on >> performance.io-thread-count=64 >> performance.cache-size=1GB >> brick-0=shchhv01-sto:-data-brick3-shchst01 >> brick-1=shchhv02-sto:-data-brick3-shchst01 >> brick-2=shchhv03-sto:-data-brick3-shchst01 >> brick-3=shchhv01-sto:-data-brick1-shchst01 >> brick-4=shchhv02-sto:-data-brick1-shchst01 >> brick-5=shchhv03-sto:-data-brick1-shchst01 >> brick-6=shchhv02-sto:-data-brick2-shchst01 >> brick-7=shchhv03-sto:-data-brick2-shchst01 >> brick-8=shchhv04-sto:-data-brick2-shchst01 >> brick-9=shchhv02-sto:-data-brick4-shchst01 >> brick-10=shchhv03-sto:-data-brick4-shchst01 >> brick-11=shchhv04-sto:-data-brick4-shchst01 >> >> NOTE >> >> [root@shchhv01 shchst01]# gluster volume get shchst01 cluster.op-version >> Warning: Support to get global option value using `volume get <volname>` >> will be deprecated from next release. Consider using `volume get all` >> instead for global options >> Option Value >> >> ------ ----- >> >> cluster.op-version 30800 >> >> [root@shchhv02 shchst01]# gluster volume get shchst01 cluster.op-version >> Option Value >> >> ------ ----- >> >> cluster.op-version 30800 >> >> -----Original Message----- >> From: gluster-users-bounces@xxxxxxxxxxx >> [mailto:gluster-users-bounces@xxxxxxxxxxx] On Behalf Of Ziemowit Pierzycki >> Sent: Tuesday, December 19, 2017 3:56 PM >> To: gluster-users <gluster-users@xxxxxxxxxxx> >> Subject: Re: Upgrading from Gluster 3.8 to 3.12 >> >> I have not done the upgrade yet. Since this is a production cluster I >> need >> to make sure it stays up or schedule some downtime if it doesn't doesn't. >> Thanks. >> >> On Tue, Dec 19, 2017 at 10:11 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> >> wrote: >> > >> > >> > On Tue, Dec 19, 2017 at 1:10 AM, Ziemowit Pierzycki >> > <ziemowit@xxxxxxxxxxxxx> >> > wrote: >> >> >> >> Hi, >> >> >> >> I have a cluster of 10 servers all running Fedora 24 along with >> >> Gluster 3.8. I'm planning on doing rolling upgrades to Fedora 27 >> >> with Gluster 3.12. I saw the documentation and did some testing but >> >> I would like to run my plan through some (more?) educated minds. >> >> >> >> The current setup is: >> >> >> >> Volume Name: vol0 >> >> Distributed-Replicate >> >> Number of Bricks: 2 x (2 + 1) = 6 >> >> Bricks: >> >> Brick1: glt01:/vol/vol0 >> >> Brick2: glt02:/vol/vol0 >> >> Brick3: glt05:/vol/vol0 (arbiter) >> >> Brick4: glt03:/vol/vol0 >> >> Brick5: glt04:/vol/vol0 >> >> Brick6: glt06:/vol/vol0 (arbiter) >> >> >> >> Volume Name: vol1 >> >> Distributed-Replicate >> >> Number of Bricks: 2 x (2 + 1) = 6 >> >> Bricks: >> >> Brick1: glt07:/vol/vol1 >> >> Brick2: glt08:/vol/vol1 >> >> Brick3: glt05:/vol/vol1 (arbiter) >> >> Brick4: glt09:/vol/vol1 >> >> Brick5: glt10:/vol/vol1 >> >> Brick6: glt06:/vol/vol1 (arbiter) >> >> >> >> After performing the upgrade because of differences in checksums, the >> >> upgraded nodes will become: >> >> >> >> State: Peer Rejected (Connected) >> > >> > >> > Have you upgraded all the nodes? If yes, have you bumped up the >> > cluster.op-version after upgrading all the nodes? Please follow : >> > http://docs.gluster.org/en/latest/Upgrade-Guide/op_version/ for more >> > details on how to bump up the cluster.op-version. In case you have >> > done all of these and you're seeing a checksum issue then I'm afraid >> > you have hit a bug. I'd need further details like the checksum >> > mismatch error from glusterd.log file along with the the exact >> > volume's info file from /var/lib/glusterd/vols/<volname>/info between >> > both the peers to debug this further. >> > >> >> >> >> If I start doing the upgrades one at a time, with nodes glt10 to >> >> glt01 except for the arbiters glt05 and glt06, and then upgrading the >> >> arbiters last, everything should remain online at all times through >> >> the process. Correct? >> >> >> >> Thanks. >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users@xxxxxxxxxxx >> >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://lists.gluster.org/mailman/listinfo/gluster-users > > -- Regards, Hari Gowtham. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users