Re: Ceph newbie thoughts and questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For gluster, when files are written into it as a mounted network gluster filesystem, it word a lot of metadata for each object to know everything it needs to about it for replication purposes. If you put the data manually on the brick then it wouldn't be able to sync.

Correct, 3 mons, 2 mds, and 3 osd nodes is a good place to start. You can choose to use erasure coding with a 2:1 setup (default if you create the pool with options for erasure coding) or a replica setup with size 3 (default configuration).

The mds data is stored in the cluster.  I have an erasure coded cephfs that has 9TB of data in it and the mds service uses 8k on disk (the size of the folder and the keyring).  This is in my home cluster and I run each node with 3 osds, a mon, and an mds.  I have replica pools and erasure coded pools based on which is right for the job.

Failover of the mds works seamlessly for the clients.  The docs recommend against hyper-converging services because if you do not have enough system resources, then your daemons can crash/hang do to resource contention.  The times you will run into resource contention is while your cluster isn't healthy. Most ceph daemons can use 2-3x more memory while the cluster isn't healthy as opposed to while it's health_ok.


On Thu, May 4, 2017, 4:17 AM Marcus <marcus.pedersen@xxxxxx> wrote:

Thank you very much for your answer David, just what I was after!

Just some additional questions to make it clear to me.
The mds do not need to be in odd numbers?
They can be set up 1,2,3,4 aso. as needed?

You made the basics clear to me so when I set up my first ceph fs I need as a start:
3 mons, 2 mds and 3 ods. (To be able to avoid single point of failure)

Is there a clear ratio/relation/approximation between ods and mds?
If I have, say, 100TB of disk for ods, do I neeed X GB disk for mds?

About gluster, my machines are set up in a gluster cluster today, but the reason for thinking about ceph fs for these machines instead is that I have problems with replication that I have not been able to solve. Second of all is that we get indications from our organisation that data use will expand very quickly, and that is where I see that ceph fs will suit us. Easy expand as needed.
Thanks to your description of gluster I will be able to reconfigure my gluster cluster and rsync to the mounted cluster. I have used rsync directly to the harddrive, and now this is obvious that it does not work (worked fine a a single distributed server, but not as a replica). I just haven't got this tip from anybody else. Thanks again!

We will start using ceph fs, because this goes hand in hand with our future needs.
 
Best regards
Marcus




On 04/05/17 06:30, David Turner wrote:
The clients will need to be able to contact the mons and the osds.  NEVER use 2 mons.  Mons are a quorum and work best with odd numbers (1, 3, 5, etc).  1 mon is better than 2 mons.  It is better to remove the raid and put the individual disks as OSDs.  Ceph handles the redundancy through replica copies.  It is much better to have a third node for failure domain reasons so you can have 3 copies of your data and have 1 in each of the 3 servers.  The OSDs store their information in broken up objects divvied up into PGs that are assigned to the OSDs.  You would need to set up CephFS and rsync the data into it to migrate the data into ceph.

I don't usually recommend this, but you might prefer Gluster.  You would use the raided disks as the brick in each node.  Set it up to have 2 copies (better to have 3 but you only have 2 nodes).  Each server can be used to NFS map the gluster mount point.  The files are stored as flat files on the bricks, but you would still need to create the gluster first and then rsync the data into the mounted gluster instead of directly onto the disk.  With this you don't have to worry about the mon service, mds service, osd services, balancing the crush map, etc.  Gluster of course has its own complexities and limitations, but it might be closer to what you're looking for right now.

On Wed, May 3, 2017 at 4:06 PM Marcus Pedersén <marcus.pedersen@xxxxxx> wrote:

Hello everybody!


I am a newbie on ceph and I really like it and want to try it out.
I have a couple of thoughts and questions after reading documentation and need some help to see that I am on the right path.

Today I have two file servers in production that I want to start my ceph fs on and expand from that.
I want these servers to function as a failover cluster and as I see it I will be able to do it with ceph.

To get a failover cluster without a single point of failure I need at least 2 monitors, 2 mds and 2 osd (my existing file servers), right?
Today, both of the file servers use a raid on 8 disks. Do I format my raid xfs and run my osds on the raid?
Or do I split up my raid and add the disks directly to the osds?

When I connect clients to my ceph fs are they talking to the mds or are the clients talking to the ods directly as well?
If the client just talk to the mds then the ods and the monitor can be in a separate network and the mds connected both to the client network and the local "ceph" network.

Today, we have about 11TB data on these file servers, how do I move the data to the ceph fs? Is it possible to rsync to one of the ods disks, start the ods daemon and let it replicate itself?

Is it possible to set up the ceph fs with 2 mds, 2 monitors and 1 ods and add the second ods later?
This is to be able to have one file server in production, config ceph and test with the other, swap to the ceph system and when it is up and running add the second ods.

Of course I will test this out before I bring it to production.

Many thanks in advance!

Best regards
Marcus


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Marcus Pedersén
System administrator


Interbull Centre
Department of Animal Breeding & Genetics — SLU
Box 7023, SE-750 07
Uppsala, Sweden

Visiting address:
Room 55614, Ulls väg 26, Ultuna
Uppsala
Sweden

Tel: +46-(0)18-67 1962
Interbull
                Logo

ISO
        certification logo

JPEG image

JPEG image

JPEG image

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux