Hi
I'm new to ceph but have to honor to look after a cluster that I haven't set up by myself.
Rushing to the ceph docs and having a first glimpse on our cluster I start worrying about our setup,
so I need some advice and guidance here.
The set up is:
3 machines, each running a ceph-monitor.
all of them are also hosting OSDs
machine A:
2 OSDs, each 3.6 TB - consisitng of 1 disk each (spinning disk)
3 OSDs, each 3.6 TB - consisting each of a 2 disk hardware-raid 0 (spinning disk)
3 OSDs, each 1.8 TB - consisting each of a 2 disk hardware-raid 0 (spinning disk)
machine B:
3 OSDs, each 3.6 TB - consisitng of 1 disk each (spinning disk)
3 OSDs, each 3.6 TB - consisting each of a 2 disk hardware-raid 0 (spinning disk)
1 OSDs, each 1.8 TB - consisting each of a 2 disk hardware-raid 0 (spinning disk)
3 OSDs, each, 0.7 TB - consisitng of 1 disk each (SSD)
machine C:
3 OSDs, each, 0.7 TB - consisitng of 1 disk each (SSD)
the spinning disks and the SSD disks are forming two seperate pools.
Now what I'm worrying about is that I read "don't use raid together with ceph"
in combination with our poolsize
:~ ceph osd pool get <poolname> size
size: 2
From what I understand from the ceph docu the size tell me "how many disks may fail" without loosing the data of the whole pool.
Is that right? or can HALF the OSDs fail (since all objects are duplicated)?
Unfortunately I'm not very good in stochastic but given a probability of 1% disk failure per year
I'm not feeling very secure with this set up (How do I calculate the value that two disks fail "at the same time"? - or ahs anybody a rough number about that?)
although looking at our OSD tree it seems we try to spread the objects always between two peers:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-19 4.76700 root here_ssd
-15 2.38350 room 2_ssd
-14 2.38350 rack 2_ssd
-4 2.38350 host B_ssd
4 hdd 0.79449 osd.4 up 1.00000 1.00000
5 hdd 0.79449 osd.5 up 1.00000 1.00000
13 hdd 0.79449 osd.13 up 1.00000 1.00000
-18 2.38350 room 1_ssd
-17 2.38350 rack 1_ssd
-5 2.38350 host C_ssd
0 hdd 0.79449 osd.0 up 1.00000 1.00000
1 hdd 0.79449 osd.1 up 1.00000 1.00000
2 hdd 0.79449 osd.2 up 1.00000 1.00000
-1 51.96059 root here_spinning
-12 25.98090 room 2_spinning
-11 25.98090 rack 2_spinning
-2 25.98090 host B_spinning
3 hdd 3.99959 osd.3 up 1.00000 1.00000
8 hdd 3.99429 osd.8 up 1.00000 1.00000
9 hdd 3.99429 osd.9 up 1.00000 1.00000
10 hdd 3.99429 osd.10 up 1.00000 1.00000
11 hdd 1.99919 osd.11 up 1.00000 1.00000
12 hdd 3.99959 osd.12 up 1.00000 1.00000
20 hdd 3.99959 osd.20 up 1.00000 1.00000
-10 25.97969 room 1_spinning
-8 25.97969 rack l1_spinning
-3 25.97969 host A_spinning
6 hdd 3.99959 osd.6 up 1.00000 1.00000
7 hdd 3.99959 osd.7 up 1.00000 1.00000
14 hdd 3.99429 osd.14 up 1.00000 1.00000
15 hdd 3.99429 osd.15 up 1.00000 1.00000
16 hdd 3.99429 osd.16 up 1.00000 1.00000
17 hdd 1.99919 osd.17 up 1.00000 1.00000
18 hdd 1.99919 osd.18 up 1.00000 1.00000
19 hdd 1.99919 osd.19 up 1.00000 1.00000
And the second question
I tracked the disk usage of our OSDs over the last two weeks and it looks somehow strange too:
While osd.14, and osd.20 are filled only well below 60%
the osd 9,16 and 18 are well about 80%
graphing that shows pretty stable parallel lines, with no hint of convergence
That's true for both the HDD and the SSD pool.
How is that and why and is that normal and okay or is there a(nother) glitch in our config?
any hints and comments are welcome
TIA
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com