Re: Ceph Cluster Failures

Rich Rocque <rich.rocque@xxxxxxxxxxxx> · Fri, 17 Mar 2017 02:51:48 +0000

Hi,

I talked with the person in charge about your initial feedback and questions. The thought is to switch to a new setup and I was asked to pass it on and ask for thoughts on whether this would be sufficient or not.

Use case:
Overview:
 Need to provide shared storage/high-availability for (usually) low-volume web server instances using distributed, POSIX-compliant filesystem, running in Amazon Web Services. Database storage is not part of the cluster.
Logic:
 We know Ceph is probably overkill for our current use (and probably also for my future use), so why Ceph? It’s performance, when using CephFS, and its ability to support RBD (if we ever move to a container approach for web servers). I’ve tried Amazon EFS (NFS-as-a-service)
 and GlusterFS (both NFS and native client), and because of the number of small files we’re working with, something that takes ~15sec. in Ceph takes several minutes using other NFS or GlusterFS solutions.
Current Load:
 ~100 connected clients accessing ~20GB data of e-commerce related website source software.
Expected Future Load:
 ~5,000 connected clients access ~1TB data

Ceph Clients:
Primary Role:
 Web server & load balancer w/ SSL termination
Hardware Configuration:
 1vCPU, 512MB ram, Ubuntu 16.04 LTS (per website/domain/subdomain: 2ea t2.nano instances, load balanced behind haproxy, rarely manually-scaling up with new instances during expected load spikes. After initial “hits,” most of the website stays in local cache,
 resulting in generally-few iops against the Ceph cluster.)

Ceph Clusters:
Overall:
 3 Co-located Clusters across 9 servers, spanning 3 AWS Availability Zones in a single region. 3 MDS per-cluster, 3 MON per cluster, 2 OSD per cluster.
Hardware Configuration (MON/MDS):
 r4.large instance class, 2vCPU, ~15GB ram, “up to 10Gbit” network (“Enhanced Networking” enabled), EBS / SSD for root (not provisioned-IOPS), Ubuntu 16.04 LTS
Hardware Configuration (OSD):
 i3.large instance class, 2vCPU, ~15GB ram, “up to 10Gbit” network (“Enhanced Networking” enabled), EBS/SSD for root (not provisioned-IOPS, but “EBS optimized” for bandwidth), ~475GB NVMe attached, ephemeral storage for OSD (co-locating journal and data)

Proposed Layout:
AZ “A”:

Server A-MM (r4.large instance):

Mon.A & MDS.A for Cluster X
Mon.A & MDS.A for Cluster Y
Mon.A & MDS.A for Cluster Z
Server A-OSD-1 (i3.large instance):

OSD.0 for Cluster X
Server A-OSD-2 (i3.large instance):

OSD.0 for Cluster Z

AZ “B”:

Server B-MM (r4.large instance):

Mon.B & MDS.B for Cluster X
Mon.B & MDS.B for Cluster Y
Mon.B & MDS.B for Cluster Z
Server B-OSD-1 (i3.large instance):

OSD.1 for Cluster X
Server B-OSD-2 (i3.large instance):

OSD.0 for Cluster Y

AZ “C”:

Server C-MM (r4.large instance):

Mon.B & MDS.B for Cluster X
Mon.B & MDS.B for Cluster Y
Mon.B & MDS.B for Cluster Z
Server C-OSD-1 (i3.large instance):

OSD.1 for Cluster Y
Server C-OSD-2 (i3.large instance):

OSD.1 for Cluster Z

Alternative Layout:
Split, by half, the NVMe storage between 2 OSDs, and provide 3ea OSDs per cluster for higher
 availability at the expense of disk read-write performance, and increase the number of clusters to 4.

Thank you for your time,

Rich

From: Christian Balzer <chibi@xxxxxxx>

Sent: Thursday, March 16, 2017 2:30:49 AM

To: Ceph Users

Cc: Robin H. Johnson; Rich Rocque

Subject: Re:  Ceph Cluster Failures

Hello,

On Thu, 16 Mar 2017 02:44:29 +0000 Robin H. Johnson wrote:

> On Thu, Mar 16, 2017 at 02:22:08AM +0000, Rich Rocque wrote:

> > Has anyone else run into this or have any suggestions on how to remedy it?  

> We need a LOT more info.

>

Indeed.

> > After a couple months of almost no issues, our Ceph cluster has

> > started to have frequent failures. Just this week it's failed about

> > three times.

> >

> > The issue appears to be than an MDS or Monitor will fail and then all

> > clients hang. After that, all clients need to be forcibly restarted.  

> - Can you define monitor 'failing' in this case? 

> - What do the logs contain? 

> - Is it running out of memory?

> - Can you turn up the debug level?

> - Has your cluster experienced continual growth and now might be

>   undersized in some regard?

> 

A single MON failure should not cause any problems to boot.

"ceph -s" , "ceph osd tree"  and "ceph osd pool ls detail" as well.

> > The architecture for our setup is:  

> Are these virtual machines? The overall specs seem rather like VM

> instances rather than hardware.

>

There are small servers like that, but a valid question indeed.

In particular, if it is dedicated HW, FULL specs.

> > 3 ea MON, MDS instances (co-located) on 2cpu, 4GB RAM servers  

> What sort of SSD are the monitor datastores on? ('mon data' in the

> config)

> 

He doesn't mention SSDs in the MON/MDS context, so we could be looking at

something even slower. FULL SPECS. 

4GB RAM would be fine for a single MON, but combined with MDS it may

be a bit tight.

> > 12 ea OSDs (ssd), on 1cpu, 1GB RAM servers  

> 12 SSDs to a single server, with 1cpu/1GB RAM? That's absurdly low-spec.

> How many OSD servers, what SSDs?

> 

I think he means 12 individual servers. Again, there are micro servers

like that around, like:

https://www.supermicro.com.tw/products/system/2U/2015/SYS-2015TA-HTRF.cfm

Super Micro Computer, Inc. - Products | SuperServers | 2U ...

www.supermicro.com.tw

2U Black Chassis : Backplane: BPN-SAS-217HQ: 1: 24-port 2U Twin^2 CSE-217HQ (6 drives per node) backplane, support up to 24x 2.5-inch SAS/SATA HDD: Backplane

IF the SSDs are decent, CPU may be tight but 1GB RAM for a combination of

OS _and_ OSD is way too little for my taste and experience.

Christian

> What is the network setup & connectivity between them (hopefully

> 10Gbit).

> 

-- 

Christian Balzer        Network/Systems Engineer                

chibi@xxxxxxx    Global OnLine Japan/Rakuten Communications

http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com