Re: Ceph newbie thoughts and questions

David Turner <drakonstein@xxxxxxxxx> · Thu, 04 May 2017 12:47:03 +0000

For gluster, when files are written into it as a mounted network gluster filesystem, it word a lot of metadata for each object to know everything it needs to about it for replication purposes. If you put the data manually on the brick then it wouldn't be able to sync.
Correct, 3 mons, 2 mds, and 3 osd nodes is a good place to start. You can choose to use erasure coding with a 2:1 setup (default if you create the pool with options for erasure coding) or a replica setup with size 3 (default configuration).
The mds data is stored in the cluster.  I have an erasure coded cephfs that has 9TB of data in it and the mds service uses 8k on disk (the size of the folder and the keyring).  This is in my home cluster and I run each node with 3 osds, a mon, and an mds.  I have replica pools and erasure coded pools based on which is right for the job.
Failover of the mds works seamlessly for the clients.  The docs recommend against hyper-converging services because if you do not have enough system resources, then your daemons can crash/hang do to resource contention.  The times you will run into resource contention is while your cluster isn't healthy. Most ceph daemons can use 2-3x more memory while the cluster isn't healthy as opposed to while it's health_ok.

On Thu, May 4, 2017, 4:17 AM Marcus <marcus.pedersen@xxxxxx> wrote:

    Thank you very much for your answer David, just what I was after!
    Just some additional questions to make it clear to me.

    The mds do not need to be in odd numbers? 

    They can be set up 1,2,3,4 aso. as needed?

    You made the basics clear to me so when I set up my first ceph fs I
    need as a start:

    3 mons, 2 mds and 3 ods. (To be able to avoid single point of
    failure)

    Is there a clear ratio/relation/approximation between ods and mds? 

    If I have, say, 100TB of disk for ods, do I neeed X GB disk for mds?

    About gluster, my machines are set up in a gluster cluster today,
    but the reason for thinking about ceph fs for these machines instead
    is that I have problems with replication that I have not been able
    to solve. Second of all is that we get indications from our
    organisation that data use will expand very quickly, and that is
    where I see that ceph fs will suit us. Easy expand as needed.

    Thanks to your description of gluster I will be able to reconfigure
    my gluster cluster and rsync to the mounted cluster. I have used
    rsync directly to the harddrive, and now this is obvious that it
    does not work (worked fine a a single distributed server, but not as
    a replica). I just haven't got this tip from anybody else. Thanks
    again!

    We will start using ceph fs, because this goes hand in hand with our
    future needs.

    Best regards

    Marcus

    On 04/05/17 06:30, David Turner wrote:

      The clients will need to be able to contact the
        mons and the osds.  NEVER use 2 mons.  Mons are a quorum and
        work best with odd numbers (1, 3, 5, etc).  1 mon is better than
        2 mons.  It is better to remove the raid and put the individual
        disks as OSDs.  Ceph handles the redundancy through replica
        copies.  It is much better to have a third node for failure
        domain reasons so you can have 3 copies of your data and have 1
        in each of the 3 servers.  The OSDs store their information in
        broken up objects divvied up into PGs that are assigned to the
        OSDs.  You would need to set up CephFS and rsync the data into
        it to migrate the data into ceph.

        I don't usually recommend this, but you might prefer
          Gluster.  You would use the raided disks as the brick in each
          node.  Set it up to have 2 copies (better to have 3 but you
          only have 2 nodes).  Each server can be used to NFS map the
          gluster mount point.  The files are stored as flat files on
          the bricks, but you would still need to create the gluster
          first and then rsync the data into the mounted gluster instead
          of directly onto the disk.  With this you don't have to worry
          about the mon service, mds service, osd services, balancing
          the crush map, etc.  Gluster of course has its own
          complexities and limitations, but it might be closer to what
          you're looking for right now.

        On Wed, May 3, 2017 at 4:06 PM Marcus Pedersén
          <marcus.pedersen@xxxxxx> wrote:

            Hello everybody!

            I am a newbie on ceph and I really like it and want to try
            it out.

            I have a couple of thoughts and questions after reading
            documentation and need some help to see that I am on the
            right path.

            Today I have two file servers in production that I want to
            start my ceph fs on and expand from that.

            I want these servers to function as a failover cluster and
            as I see it I will be able to do it with ceph.

            To get a failover cluster without a single point of failure
            I need at least 2 monitors, 2 mds and 2 osd (my existing
            file servers), right?

            Today, both of the file servers use a raid on 8 disks. Do I
            format my raid xfs and run my osds on the raid?

            Or do I split up my raid and add the disks directly to the
            osds?

            When I connect clients to my ceph fs are they talking to the
            mds or are the clients talking to the ods directly as well?

            If the client just talk to the mds then the ods and the
            monitor can be in a separate network and the mds connected
            both to the client network and the local "ceph" network.

            Today, we have about 11TB data on these file servers, how do
            I move the data to the ceph fs? Is it possible to rsync to
            one of the ods disks, start the ods daemon and let it
            replicate itself?

            Is it possible to set up the ceph fs with 2 mds, 2 monitors
            and 1 ods and add the second ods later?

            This is to be able to have one file server in production,
            config ceph and test with the other, swap to the ceph system
            and when it is up and running add the second ods.

            Of course I will test this out before I bring it to
            production.

            Many thanks in advance!

            Best regards

            Marcus

          _______________________________________________

          ceph-users mailing list

          ceph-users@xxxxxxxxxxxxxx

          http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

 Marcus Pedersén

      System administrator

              Interbull Centre

              Department of Animal Breeding & Genetics — SLU

              Box 7023, SE-750 07

              Uppsala, Sweden

              Visiting address:

              Room 55614, Ulls väg 26, Ultuna

              Uppsala

              Sweden

              Tel: +46-(0)18-67 1962

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com