Re: Command "/etc/init.d/glusterd start" failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have been plagued by errors of this kind every so often, mainly because we are in a development phase and we reboot our servers so frequently. If you start glusterd in debug mode:

sh$ glusterd --debug

you can easily pinpoint exactly which volume/peer data is causing the initialization failure for mgmt/glusterd.

In addition, from my own experiences, two of the leading reasons for failure include:
a) Bad peer data if glusterd is somehow killed during an active peer probe operation, and
b) I have noticed that if glusterd needs to update info for volume/brick (say "info" for volume testvol) in /var/lib/glusterd, it first renames /var/lib/glusterd/vols/testvol/info to info.tmp, and then creates a new file info, which is probably written into _freshly_. If glusterd were to crash at this point, it would cause failures in glusterd startup till this is manually resolved. Usually, moving info.tmp into info works for me.

Thanks,
Anirban

On Saturday 12 April 2014 08:45 AM, 吴保川 wrote:
It is tcp.

[root@server1 wbc]# gluster volume info
 
Volume Name: gv_replica
Type: Replicate
Volume ID: 81014863-ee59-409b-8897-6485d411d14d
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.3:/home/wbc/vdir/gv_replica
Brick2: 192.168.1.4:/home/wbc/vdir/gv_replica
 
Volume Name: gv1
Type: Distribute
Volume ID: cfe2b8a0-284b-489d-a153-21182933f266
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.4:/home/wbc/vdir/gv1
Brick2: 192.168.1.3:/home/wbc/vdir/gv1

Thanks,
Baochuan Wu



2014-04-12 10:11 GMT+08:00 Nagaprasad Sathyanarayana <nsathyan@xxxxxxxxxx>:
If you run

 # gluster volume info

What is the value set for transport-type?

Thanks
Naga


On 12-Apr-2014, at 7:33 am, 吴保川 <wildpointercs@xxxxxxxxx> wrote:

Thanks, Joe. I found one of my machine has been assigned wrong IP address. This leads to the error.
Originally, I thought the following error is critical:
[2014-04-11 18:12:03.433371] E [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: /usr/local/lib/glusterfs/3.4.3/rpc-transport/rdma.so: cannot open shared object file: No such file or directory


2014-04-12 5:34 GMT+08:00 Joe Julian <joe@xxxxxxxxxxxxxxxx>:
On 04/11/2014 11:18 AM, 吴保川 wrote:
[2014-04-11 18:12:05.165989] E [glusterd-store.c:2663:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
I'm pretty sure that means that one of the bricks isn't resolved in your list of peers.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux