Re: Ceph Very Small Cluster

David <dclistslinux@xxxxxxxxx> · Thu, 29 Sep 2016 12:46:02 +0100

Ranjan,
If you unmount the file system on both nodes and then gracefully stop the Ceph services (or even yank the network cable for that node), what state is your cluster in? Are you able to do a basic rados bench write and read?

How are you mounting CephFS, through the Kernel or Fuse client? Have you tested with both to see if you get the same issue with blocked requests?

When you say an OSD on each node, are we talking about literally 1 OSD daemon on each node? What is the storage behind that?

On Wed, Sep 28, 2016 at 4:03 PM, Ranjan Ghosh <ghosh@xxxxxx> wrote:
Hi everyone,

Up until recently, we were using GlusterFS to have two web servers in sync so we could take one down and switch back and forth between them - e.g. for maintenance or failover. Usually, both were running, though. The performance was abysmal, unfortunately. Copying many small files on the file system caused outages for several minutes - simply unacceptable. So I found Ceph. It's fairly new but I thought I'd give it a try. I liked especially the good, detailed documentation, the configurability and the many command-line tools which allow you to find out what is going on with your Cluster. All of this is severly lacking with GlusterFS IMHO.

Because we're on a very tiny budget for this project we cannot currently have more than two file system servers. I added a small Virtual Server, though, only for monitoring. So at least we have 3 monitoring nodes. I also created 3 MDS's, though as far as I understood, two are only for standby. To sum it up, we have:

server0: Admin (Deployment started from here) + Monitor + MDS

server1: Monitor + MDS + OSD

server2: Monitor + MDS + OSD

So, the OSD is on server1 and server2 which are next to each other connected by a local GigaBit-Ethernet connection. The cluster is mounted (also on server1 and server2) as /var/www and Apache is serving files off the cluster.

I've used these configuration settings:

osd pool default size = 2

osd pool default min_size = 1

My idea was that by default everything should be replicated on 2 servers i.e. each file is normally written on server1 and server2. In case of emergency though (one server has a failure), it's better to keep operating and only write the file to one server. Therefore, i set min_size = 1. My further understanding is (correct me if I'm wrong), that when the server comes back online, the files that were written to only 1 server during the outage will automatically be replicated to the server that has come back online.

So far, so good. With two servers now online, the performance is light-years away from sluggish GlusterFS. I've also worked with XtreemFS, OCFS2, AFS and never had such a good performance with any Cluster. In fact it's so blazingly fast, that I had to check twice I really had the cluster mounted and wasnt accidentally working on the hard drive. Impressive. I can edit files on server1 and they are immediately changed on server2 and vice versa. Great!

Unfortunately, when I'm now stopping all ceph-Services on server1, the websites on server2 start to hang/freeze. And "ceph health" shows "#x blocked requests". Now, what I don't understand: Why is it blocking? Shouldnt both servers have the file? And didn't I set min_size to "1"? And if there are a few files (could be some unimportant stuff) that's missing on one of the servers: How can I abort the blocking? I'd rather have a missing file or whatever, then a completely blocking website.

Are my files really duplicated 1:1 - or are they perhaps spread evenly between both OSDs? Do I have to edit the crushmap to achieve a real "RAID-1"-type of replication? Is there a command to find out for a specific file where it actually resides and whether it has really been replicated?

Thank you!

Ranjan

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com