Re: Initial mount problem - all subvolumes are down

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Tue, 31 Mar 2015 12:27:59 +0530



    Atin,

           Could it be because bricks are started with
    PROC_START_NO_WAIT?

    
    Pranith

    On 03/31/2015 04:41 AM, Rumen Telbizov
      wrote:

    
        Hello
          everyone,

          
        I
          have a problem that I am trying to resolve and not sure which
          way to go so here I am asking for your advise. 

          
          What it comes down to is that upon initial boot of all my
          GlusterFS machines the shared volume doesn't get mounted.
          Nevertheless the volume successfully created and started and
          further attempts to mount it manually succeed. I suspect
          what's happening is that gluster processes/bricks/etc haven't
          fully started at the time the /etc/fstab entry is read and the
          initial mount attempt is being made. Again, by the time I log
          in and run a mount -a -- the volume mounts without any issues.
          

          Details from the logs:

        
          [2015-03-30
              22:29:04.381918] I [MSGID: 100030]
              [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started
              running /usr/sbin/glusterfs version 3.6.2 (args:
              /usr/sbin/glusterfs
              --log-file=/var/log/glusterfs/glusterfs.log
              --attribute-timeout=0 --entry-timeout=0
              --volfile-server=localhost --volfile-server=10.12.130.21
              --volfile-server=10.12.130.22
              --volfile-server=10.12.130.23 --volfile-id=/myvolume
              /opt/shared)

              [2015-03-30 22:29:04.394913] E
              [socket.c:2267:socket_connect_finish] 0-glusterfs:
              connection to 127.0.0.1:24007 failed
              (Connection refused)

              [2015-03-30 22:29:04.394950] E
              [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
              0-glusterfsd-mgmt: failed to connect with remote-host:
              localhost (Transport endpoint is not connected)

              [2015-03-30 22:29:04.394964] I
              [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
              0-glusterfsd-mgmt: connecting to next volfile server
              10.12.130.21

              [2015-03-30 22:29:08.390687] E
              [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
              0-glusterfsd-mgmt: failed to connect with remote-host:
              10.12.130.21 (Transport endpoint is not connected)

              [2015-03-30 22:29:08.390720] I
              [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
              0-glusterfsd-mgmt: connecting to next volfile server
              10.12.130.22

              [2015-03-30 22:29:11.392015] E
              [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
              0-glusterfsd-mgmt: failed to connect with remote-host:
              10.12.130.22 (Transport endpoint is not connected)

              [2015-03-30 22:29:11.392050] I
              [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
              0-glusterfsd-mgmt: connecting to next volfile server
              10.12.130.23

              [2015-03-30 22:29:14.406429] I
              [dht-shared.c:337:dht_init_regex] 0-brain-dht: using regex
              rsync-hash-regex = ^\.(.+)\.[^.]+$

              [2015-03-30 22:29:14.408964] I
              [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2:
              setting frame-timeout to 60

              [2015-03-30 22:29:14.409183] I
              [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1:
              setting frame-timeout to 60

              [2015-03-30 22:29:14.409388] I
              [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0:
              setting frame-timeout to 60

              [2015-03-30 22:29:14.409430] I [client.c:2280:notify]
              0-host-client-0: parent translators are ready, attempting
              connect on transport

              [2015-03-30 22:29:14.409658] I [client.c:2280:notify]
              0-host-client-1: parent translators are ready, attempting
              connect on transport

              [2015-03-30 22:29:14.409844] I [client.c:2280:notify]
              0-host-client-2: parent translators are ready, attempting
              connect on transport

              Final graph:

              
              ....

              
              [2015-03-30 22:29:14.411045] I
              [client.c:2215:client_rpc_notify] 0-host-client-2:
              disconnected from host-client-2. Client process will keep
              trying to connect to glusterd until brick's port is
              available

            [2015-03-30 22:29:14.411063] E
                [MSGID: 108006] [afr-common.c:3591:afr_notify]
                0-myvolume-replicate-0: All subvolumes are down. Going
                offline until atleast one of them comes back up.

              [2015-03-30 22:29:14.414871] I
              [fuse-bridge.c:5080:fuse_graph_setup] 0-fuse: switched to
              graph 0

              [2015-03-30 22:29:14.415003] I
              [fuse-bridge.c:4009:fuse_init] 0-glusterfs-fuse: FUSE
              inited with protocol versions: glusterfs 7.22 kernel 7.17

              [2015-03-30 22:29:14.415101] I
              [afr-common.c:3722:afr_local_init] 0-myvolume-replicate-0:
              no subvolumes up

              [2015-03-30 22:29:14.415215] I
              [afr-common.c:3722:afr_local_init] 0-myvolume-replicate-0:
              no subvolumes up

              [2015-03-30 22:29:14.415236] W
              [fuse-bridge.c:779:fuse_attr_cbk] 0-glusterfs-fuse: 2:
              LOOKUP() / => -1 (Transport endpoint is not connected)

              [2015-03-30 22:29:14.419007] I
              [fuse-bridge.c:4921:fuse_thread_proc] 0-fuse: unmounting
              /opt/shared

            [2015-03-30 22:29:14.420176] W
                [glusterfsd.c:1194:cleanup_and_exit] (--> 0-:
                received signum (15), shutting down

            [2015-03-30 22:29:14.420192] I
              [fuse-bridge.c:5599:fini] 0-fuse: Unmounting
              '/opt/shared'.

            
        Relevant
            /etc/fstab entries are:
        

        /dev/xvdb
          /opt/local xfs defaults,noatime,nodiratime 0 0

          
          localhost:/myvolume /opt/shared glusterfs
          defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
          0 0

        
        Volume
            configuration is:

        
        Volume Name:
          myvolume

          Type: Replicate

          Volume ID: xxxx

          Status: Started

          Number of Bricks: 1 x 3 = 3

          Transport-type: tcp

          Bricks:

          Brick1: host1:/opt/local/brick

          Brick2: host2:/opt/local/brick

          Brick3: host3:/opt/local/brick

          Options Reconfigured:

          storage.health-check-interval: 5

          network.ping-timeout: 5

          nfs.disable: on

          auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23

          cluster.quorum-type: auto

          network.frame-timeout: 60

        
          I run Debian 7 and the following GlusterFS version 3.6.2-2.

          
        While
          I could together some rc.local type of script which retries to
          mount the volume for a while until it succeeds or times out I
          was wondering if there's a better way to solve this problem?

        
        Thank
          you for your help.

          
        Regards,

        
        -- 

        
            Rumen
                Telbizov
              Unix Systems Administrator
            
          
      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users