Re: Self-Heal Daemon not starting after upgrade 6.10 to 7.8

Ravishankar N <ravishankar@xxxxxxxxxx> · Tue, 3 Nov 2020 16:47:39 +0530

    On 02/11/20 8:35 pm, Olaf Buitelaar
      wrote:

      Dear Gluster users,

        I'm trying to upgrade from gluster 6.10 to 7.8, i've
          currently tried this on 2 hosts, but on both the Self-Heal
          Daemon refuses to start.
        It could be because not all not are updated yet, but i'm a
          bit hesitant to continue, without the Self-Heal Daemon
          running.
        I'm not using quata's and i'm not seeing the peer reject
          messages, as other users reported in the mailing list.
        In fact gluster peer status and gluster pool list, display
          all nodes as connected.
        Also gluster v heal <vol> info shows all nodes as
          Status: connected, however some report pending heals, which
          don't really seem to progress. 
        Only in gluster v status <vol> the 2 upgraded nodes
          report not running; 

        Self-heal Daemon on localhost               N/A       N/A  
               N       N/A

          Self-heal Daemon on 10.32.9.5               N/A       N/A    
             Y       24022

          Self-heal Daemon on 10.201.0.4              N/A       N/A    
             Y       26704

          Self-heal Daemon on 10.201.0.3              N/A       N/A    
             N       N/A

          Self-heal Daemon on 10.32.9.4               N/A       N/A    
             Y       46294

          Self-heal Daemon on 10.32.9.3               N/A       N/A    
             Y       22194

          Self-heal Daemon on 10.201.0.9              N/A       N/A    
             Y       14902

          Self-heal Daemon on 10.201.0.6              N/A       N/A    
             Y       5358

          Self-heal Daemon on 10.201.0.5              N/A       N/A    
             Y       28073

          Self-heal Daemon on 10.201.0.7              N/A       N/A    
             Y       15385

          Self-heal Daemon on 10.201.0.1              N/A       N/A    
             Y       8917

          Self-heal Daemon on 10.201.0.12             N/A       N/A    
             Y       56796

          Self-heal Daemon on 10.201.0.8              N/A       N/A    
             Y       7990

          Self-heal Daemon on 10.201.0.11             N/A       N/A    
             Y       68223

          Self-heal Daemon on 10.201.0.10             N/A       N/A    
             Y       20828

        After the upgrade i see the
          file /var/lib/glusterd/vols/<vol>/<vol>-shd.vol
          being created, which doesn't exists on the 6.10 nodes. 

        in the logs i see these relevant messages; 
        log: glusterd.log
        0-management: Regenerating volfiles due to a max op-version
          mismatch or glusterd.upgrade file not being present,
          op_version retrieved:60000, max op_version: 70200

    I think this is because of the shd multiplex
      (https://bugzilla.redhat.com/show_bug.cgi?id=1659708) added by
      Rafi.
    Rafi, is there any workaround which can work for rolling
      upgrades? Or should we just do an offline upgrade of all server
      nodes for the shd to come online? 

    -Ravi

        [2020-10-31 21:48:42.256193] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: tier-enabled

          [2020-10-31 21:48:42.256232] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: brick-0

          [2020-10-31 21:48:42.256240] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: brick-1

          [2020-10-31 21:48:42.256246] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: brick-2

          [2020-10-31 21:48:42.256251] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: brick-3

          [2020-10-31 21:48:42.256256] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: brick-4

          [2020-10-31 21:48:42.256261] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: brick-5

          [2020-10-31 21:48:42.256266] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: brick-6

          [2020-10-31 21:48:42.256271] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: brick-7

          [2020-10-31 21:48:42.256276] W [MSGID: 106204]
          [glusterd-store.c:3275:glusterd_store_update_volinfo]
          0-management: Unknown key: brick-8

        [2020-10-31 21:51:36.049009] W [MSGID: 106617]
          [glusterd-svc-helper.c:948:glusterd_attach_svc] 0-glusterd:
          attach failed for glustershd(volume=backups)

          [2020-10-31 21:51:36.049055] E [MSGID: 106048]
          [glusterd-shd-svc.c:482:glusterd_shdsvc_start] 0-glusterd:
          Failed to attach shd svc(volume=backups) to pid=9262

          [2020-10-31 21:51:36.049138] E [MSGID: 106615]
          [glusterd-shd-svc.c:638:glusterd_shdsvc_restart] 0-management:
          Couldn't start shd for vol: backups on restart

          [2020-10-31 21:51:36.183133] I [MSGID: 106618]
          [glusterd-svc-helper.c:901:glusterd_attach_svc] 0-glusterd:
          adding svc glustershd (volume=backups) to existing process
          with pid 9262

        log: glustershd.log

        [2020-10-31 21:49:55.976120] I [MSGID: 100041]
          [glusterfsd-mgmt.c:1111:glusterfs_handle_svc_attach]
          0-glusterfs: received attach request for
          volfile-id=shd/backups

          [2020-10-31 21:49:55.976136] W [MSGID: 100042]
          [glusterfsd-mgmt.c:1137:glusterfs_handle_svc_attach]
          0-glusterfs: got attach for shd/backups but no active graph
          [Invalid argument]

        So i suspect something in the logic for the self-heal
          daemon has changed, since it has the new *.vol configuration
          for the shd. Question is, is this just a transitional state,
          till all nodes are upgraded. And thus safe to continue the
          update. Or is this something that should be fixed, and if so,
          any clues how?

        Thanks Olaf

      ________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users