A couple weeks ago while upgrading from 3.4 to 3.5 our 12 node cluster (brick on each node - replicated in pairs) one of the nodes spontaneously killed all userland processes either when I ran service glusterfs-server restart or the gluster volume status command immediately after it. (It was pasted as a one liner).
Today, I had the need to restart glusterfsd on a different node and so had killed glusterfsd and restarted glusterfs-server as a precaution caused the exact same issue - all user land processes were killed (including my ssh connection).
Has anyone else experience a similar issue or can point to anything to look at to figure out the cause?
At present nothing stands out in the logs themselves - glustershd.log records up to the event happening and then nothing. Other logs for the bricks and volumes look normal and only show one startup of when I did log back in and restart the services that had been killed.
System logs don't have any traces - just the notices that services are starting back up (eg. rsyslogd)
Servers: Ubuntu 14.04 using the GlusterFS ppa:
Volume Name: volumes
Type: Distributed-Replicate
Volume ID: 4dbcd901-af7a-437b-90c7-6b45def03748
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: yyc-c01:/openstack_zfs/volumes/gluster
Brick2: yyc-c02:/openstack_zfs/volumes/gluster
Brick3: yyc-c03:/openstack_zfs/volumes/gluster
Brick4: yyc-c04:/openstack_zfs/volumes/gluster
Brick5: yyc-c05:/openstack_zfs/volumes/gluster
Brick6: yyc-c06:/openstack_zfs/volumes/gluster
Brick7: yyc-c07:/openstack_zfs/volumes/gluster
Brick8: yyc-c08:/openstack_zfs/volumes/gluster
Brick9: yyc-c09:/openstack_zfs/volumes/gluster
Brick10: yyc-c10:/openstack_zfs/volumes/gluster
Brick11: yyc-c11:/openstack_zfs/volumes/gluster
Brick12: yyc-c12:/openstack_zfs/volumes/gluster
Thanks,
-- Micheal
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users