What is your gluster version? There was a bug in 3.10, when you reboot a node some bricks may not come online but it fixed in later versions. On 8/16/18, Hu Bert <revirii@xxxxxxxxxxxxxx> wrote: > Hi there, > > 2 times i had to replace a brick on 2 different servers; replace went > fine, heal took very long but finally finished. From time to time you > have to reboot the server (kernel upgrades), and i've noticed that the > replaced brick doesn't come up after the reboot. Status after reboot: > > gluster volume status > Status of volume: shared > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------------------------ > Brick gluster11:/gluster/bricksda1/shared 49164 0 Y > 6425 > Brick gluster12:/gluster/bricksda1/shared 49152 0 Y > 2078 > Brick gluster13:/gluster/bricksda1/shared 49152 0 Y > 2478 > Brick gluster11:/gluster/bricksdb1/shared 49165 0 Y > 6452 > Brick gluster12:/gluster/bricksdb1/shared 49153 0 Y > 2084 > Brick gluster13:/gluster/bricksdb1/shared 49153 0 Y > 2497 > Brick gluster11:/gluster/bricksdc1/shared 49166 0 Y > 6479 > Brick gluster12:/gluster/bricksdc1/shared 49154 0 Y > 2090 > Brick gluster13:/gluster/bricksdc1/shared 49154 0 Y > 2485 > Brick gluster11:/gluster/bricksdd1/shared 49168 0 Y > 7897 > Brick gluster12:/gluster/bricksdd1_new/shared 49157 0 Y > 7632 > Brick gluster13:/gluster/bricksdd1_new/shared N/A N/A N > N/A > Self-heal Daemon on localhost N/A N/A Y > 25483 > Self-heal Daemon on gluster13 N/A N/A Y > 2463 > Self-heal Daemon on gluster12 N/A N/A Y > 17619 > > Task Status of Volume shared > ------------------------------------------------------------------------------ > There are no active volume tasks > > Here gluster13:/gluster/bricksdd1_new/shared is not up. Related log > message after reboot in glusterd.log: > > [2018-08-16 05:22:52.986757] W [socket.c:593:__socket_rwv] > 0-management: readv on > /var/run/gluster/02d086b75bfc97f2cce96fe47e26dcf3.socket failed (No > data available) > [2018-08-16 05:22:52.987648] I [MSGID: 106005] > [glusterd-handler.c:6071:__glusterd_brick_rpc_notify] 0-management: > Brick gluster13:/gluster/bricksdd1_new/shared has disconnected from > glusterd. > [2018-08-16 05:22:52.987908] E [rpc-clnt.c:350:saved_frames_unwind] > (--> > /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x13e)[0x7fdbaa398b8e] > (--> /usr/lib/x86_64- > linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fdbaa15f111] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fdbaa15f23e] > (--> /usr/lib/x86_64-linu > x-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7fdbaa1608d1] > (--> > /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x288)[0x7fdbaa1613f8] > ))))) 0-management: force > d unwinding frame type(brick operations) op(--(4)) called at > 2018-08-16 05:22:52.941332 (xid=0x2) > [2018-08-16 05:22:52.988058] W [dict.c:426:dict_set] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.12/xlator/mgmt/glusterd.so(+0xd1e59) > [0x7fdba4f9ce59] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set_int32+0x2b) > [0x7fdbaa39122b] > -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_set+0xd3) > [0x7fdbaa38fa13] ) 0-dict: !this || !value for key=index [I > nvalid argument] > [2018-08-16 05:22:52.988092] E [MSGID: 106060] > [glusterd-syncop.c:1014:gd_syncop_mgmt_brick_op] 0-management: Error > setting index on brick status rsp dict > > This problem could be related to my previous mail. After executing > "gluster volume start shared force" the brick comes up, resulting in > healing the brick (and in high load, too). Is there any possibility to > track down why this happens and how to ensure that the brick comes up > at boot? > > > Best regards > Hubert > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users