Production System Evaluation / Problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey guys,

 

we’re evaluating ceph at the moment for a bigger production-ready implementation. So far we’ve had some success and

some problems with ceph. In combination with Proxmox CEPH works quite well, if taken out of the box. I’ve tried to coverup my questions

with existing answers and solutions but i still find some things unclear. Here are the things i’m having problems with:

 

1.       The first question is just for my understanding: How does CEPH account failure domains? For what i’ve read by now is
that i create a new CRUSH-Map with for example 2 datacenters, each DC has a rack and in this rack there is a chassis with nodes.
By using an own CRUSH-Map CEPH will „see“ it and deal with the data automatically. What i am missing here is some more possible adjustment.
For example i want to define that by using a replica of 3 i want CEPH to store the data 2 times in datacenter A and one time in datacenter B. Further
more i want read-access exclusivly within 1 datacenter (if possible and data is available) to keep rtt low. Is this possible?

2.       I’ve build my own CRUSH-Map and tried to get it working. No success at all. I’m literally „done with this s…“ J thats why im here right now. Here is the state

of the cluster:

 

    cluster 42f04e55-0a3f-4644-8543-516cd46cd4e9

     health HEALTH_WARN

            79 pgs degraded

            262 pgs stale

            79 pgs stuck degraded

            262 pgs stuck stale

            512 pgs stuck unclean

            79 pgs stuck undersized

            79 pgs undersized

     monmap e8: 6 mons at {0=192.168.40.20:6789/0,1=192.168.40.21:6789/0,2=192.168.40.22:6789/0,3=192.168.40.23:6789/0,4=192.168.40.24:6789/0,5=192.168.40.25:6789/0}

            election epoch 86, quorum 0,1,2,3,4,5 0,1,2,3,4,5

     mdsmap e2: 0/0/1 up

     osdmap e212: 6 osds: 5 up, 5 in; 250 remapped pgs

      pgmap v366013: 512 pgs, 2 pools, 0 bytes data, 0 objects

            278 MB used, 900 GB / 901 GB avail

                 250 active+remapped

                 183 stale+active+remapped

                  79 stale+active+undersized+degraded+remapped

 

Here the config:

 

 

ID  WEIGHT  TYPE NAME                  UP/DOWN REWEIGHT PRIMARY-AFFINITY

-27 1.07997 root default

-25 0.53998     datacenter datacenter1

-23 0.53998         chassis chassis1

-1 0.17999             blade blade3

  0 0.17999                 osd.0         down        0          1.00000

-2 0.17999             blade blade4

  1 0.17999                 osd.1           up  1.00000          1.00000

-3 0.17999             blade blade5

  2 0.17999                 osd.2           up  1.00000          1.00000

-26 0.53999     datacenter datacenter2

-24 0.53999         chassis chassis2

-17 0.17999             blade blade17

  3 0.17999                 osd.3           up  0.95001          1.00000

-18 0.17999             blade blade18

  4 0.17999                 osd.4           up  1.00000          1.00000

-19 0.17999             blade blade19

  5 0.17999                 osd.5           up  1.00000          1.00000

 

I simply cant get osd.0 back up. I took it offline, out, reinserterd, resetup, deleted the osd configs, remade them, no success

whatsoever. IMHO the documentation on this part is a bit „lousy“ so im missing some points of information here, sorry folks.

 

3.       Last but not least i would like to know whether it is a good idea to have the data and config network instead on 2 dedicated nics on 2 dedicated vlans. Our

Hardware is redundant and we got 10GIG Fibreoptics inhouse and 80 GIG between the two datacenters. The data-vlan is using jumbo frames while the others dont.

 

4.       Do you guys have somekind of „best practice“-book available for large scale deployments? 20+ Servers up to 100+ and 1000+

 

Regards

 

Florian

 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux