Hello, On Thu, 16 Mar 2017 02:44:29 +0000 Robin H. Johnson wrote: > On Thu, Mar 16, 2017 at 02:22:08AM +0000, Rich Rocque wrote: > > Has anyone else run into this or have any suggestions on how to remedy it? > We need a LOT more info. > Indeed. > > After a couple months of almost no issues, our Ceph cluster has > > started to have frequent failures. Just this week it's failed about > > three times. > > > > The issue appears to be than an MDS or Monitor will fail and then all > > clients hang. After that, all clients need to be forcibly restarted. > - Can you define monitor 'failing' in this case? > - What do the logs contain? > - Is it running out of memory? > - Can you turn up the debug level? > - Has your cluster experienced continual growth and now might be > undersized in some regard? > A single MON failure should not cause any problems to boot. "ceph -s" , "ceph osd tree" and "ceph osd pool ls detail" as well. > > The architecture for our setup is: > Are these virtual machines? The overall specs seem rather like VM > instances rather than hardware. > There are small servers like that, but a valid question indeed. In particular, if it is dedicated HW, FULL specs. > > 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers > What sort of SSD are the monitor datastores on? ('mon data' in the > config) > He doesn't mention SSDs in the MON/MDS context, so we could be looking at something even slower. FULL SPECS. 4GB RAM would be fine for a single MON, but combined with MDS it may be a bit tight. > > 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers > 12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec. > How many OSD servers, what SSDs? > I think he means 12 individual servers. Again, there are micro servers like that around, like: https://www.supermicro.com.tw/products/system/2U/2015/SYS-2015TA-HTRF.cfm IF the SSDs are decent, CPU may be tight but 1GB RAM for a combination of OS _and_ OSD is way too little for my taste and experience. Christian > What is the network setup & connectivity between them (hopefully > 10Gbit). > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com