Complete machine lockup, v3.4.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

 

We’ve been using 3.3.2 for a while, and recently started to migrate to 3.4.2. We run on platform CentOS 6.5 for 3.4.2 (while 3.3.2 were installed on CentOS 6.4)

 

Recently, we’ve have a very scary condition happen and we do not know exactly the cause of it.

 

We have a 3 nodes cluster with a replication factor of 3. Each node has one brick, which is made out of one RAID0 volume, comprised of multiple SSDs.

 

Following some read/write errors, nodes 2 and 3 have completely locked. Nothing could be done physically (nothing on the screen, nothing by SSH), physical power cycle had to be done. Node 1 was still accessible, but its fuse client rejected most if not all reads and writes.

 

Has anyone experienced something similar?

 

Before the system freeze, the last thing the kernel seemed to be doing is killing HTTPD threads (INFO: task httpd:7910 blocked for more than 120 seconds.)  End-users talk to Apache in order to read/write from the Gluster volume, so it seems a simple case of “something wrong” with gluster which locks read/writes, and eventually the kernel kills them.

 

At this point, we’re unsure where to look. Nothing very specific can be found in the logs, but perhaps if someone has pointers of what to look for, that could give us a new search track.

 

Thanks

 

Laurent Chouinard

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux