Re: Ceph Cluster Failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Thu, 16 Mar 2017 02:44:29 +0000 Robin H. Johnson wrote:

> On Thu, Mar 16, 2017 at 02:22:08AM +0000, Rich Rocque wrote:
> > Has anyone else run into this or have any suggestions on how to remedy it?  
> We need a LOT more info.
>
Indeed.
 
> > After a couple months of almost no issues, our Ceph cluster has
> > started to have frequent failures. Just this week it's failed about
> > three times.
> >
> > The issue appears to be than an MDS or Monitor will fail and then all
> > clients hang. After that, all clients need to be forcibly restarted.  
> - Can you define monitor 'failing' in this case? 
> - What do the logs contain? 
> - Is it running out of memory?
> - Can you turn up the debug level?
> - Has your cluster experienced continual growth and now might be
>   undersized in some regard?
> 
A single MON failure should not cause any problems to boot.

"ceph -s" , "ceph osd tree"  and "ceph osd pool ls detail" as well.

> > The architecture for our setup is:  
> Are these virtual machines? The overall specs seem rather like VM
> instances rather than hardware.
>
There are small servers like that, but a valid question indeed.
In particular, if it is dedicated HW, FULL specs.
 
> > 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers  
> What sort of SSD are the monitor datastores on? ('mon data' in the
> config)
> 
He doesn't mention SSDs in the MON/MDS context, so we could be looking at
something even slower. FULL SPECS. 

4GB RAM would be fine for a single MON, but combined with MDS it may
be a bit tight.

> > 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers  
> 12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec.
> How many OSD servers, what SSDs?
> 
I think he means 12 individual servers. Again, there are micro servers
like that around, like:
https://www.supermicro.com.tw/products/system/2U/2015/SYS-2015TA-HTRF.cfm

IF the SSDs are decent, CPU may be tight but 1GB RAM for a combination of
OS _and_ OSD is way too little for my taste and experience.

Christian

> What is the network setup & connectivity between them (hopefully
> 10Gbit).
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux