I have to thank you all. You give free support and this already helps me. I’m not the one who knows ceph that good, but everyday it’s getting better and better ;-)
According to the article Brad posted I have to change the ceph osd crush tunables. But there are two questions left as I already wrote:
- According to http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables there are a few profiles. My needed profile would be BOBTAIL (CRUSH_TUNABLES2) wich would set choose_total_tries to 50. For the beginning better than 19. There I also see: "You can select a profile on a running cluster with the command: ceph osd crush tunables {PROFILE}“. My question on this is: Even if I run hammer, is it good and possible to set it to bobtail?
- We can also read: WHICH CLIENT VERSIONS SUPPORT CRUSH_TUNABLES2 - v0.55 or later, including bobtail series (v0.56.x) - Linux kernel version v3.9 or later (for the file system and RBD kernel clients)
And here my question is: If my clients use librados (version hammer), do I need to have this required kernel version on the clients or the ceph nodes?
I don’t want to have troubles at the end with my clients. Can someone answer me this, before I change the settings?
Yeah, Sam is correct. I've not looked at crushmap. But I should have noticed what troublesome is with looking at `ceph osd tree`. That's my bad, sorry for that. Again please refer to: http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/Regards, On Wed, Jan 11, 2017 at 1:50 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: Shinobu isn't correct, you have 9/9 osds up and running. up does not equal acting because crush is having trouble fulfilling the weights in your crushmap and the acting set is being padded out with an extra osd which happens to have the data to keep you up to the right number of replicas. Please refer back to Brad's post. -Sam
On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller <mueller.marcus@xxxxxxxxx> wrote:
Ok, i understand but how can I debug why they are not running as they should? For me I thought everything is fine because ceph -s said they are up and running.
I would think of a problem with the crush map.
Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:
e.g., OSD7 / 3 / 0 are in the same acting set. They should be up, if they are properly running.
# 9.7 <snip>
"up": [ 7, 3 ], "acting": [ 7, 3, 0 ],
<snip>
Here is an example:
"up": [ 1, 0, 2 ], "acting": [ 1, 0, 2 ],
Regards,
On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller <mueller.marcus@xxxxxxxxx> wrote:
That's not perfectly correct.
OSD.0/1/2 seem to be down.
Sorry but where do you see this? I think this indicates that they are up: osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?
Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:
On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller <mueller.marcus@xxxxxxxxx> wrote:
All osds are currently up:
health HEALTH_WARN 4 pgs stuck unclean recovery 4482/58798254 objects degraded (0.008%) recovery 420522/58798254 objects misplaced (0.715%) noscrub,nodeep-scrub flag(s) set monmap e9: 5 mons at {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0} election epoch 478, quorum 0,1,2,3,4 ceph1,ceph2,ceph3,ceph4,ceph5 osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs flags noscrub,nodeep-scrub pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects 15070 GB used, 40801 GB / 55872 GB avail 4482/58798254 objects degraded (0.008%) 420522/58798254 objects misplaced (0.715%) 316 active+clean 4 active+remapped client io 56601 B/s rd, 45619 B/s wr, 0 op/s
This did not chance for two days or so.
By the way, my ceph osd df now looks like this:
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR 0 1.28899 1.00000 3724G 1699G 2024G 45.63 1.69 1 1.57899 1.00000 3724G 1708G 2015G 45.87 1.70 2 1.68900 1.00000 3724G 1695G 2028G 45.54 1.69 3 6.78499 1.00000 7450G 1241G 6208G 16.67 0.62 4 8.39999 1.00000 7450G 1228G 6221G 16.49 0.61 5 9.51500 1.00000 7450G 1239G 6210G 16.64 0.62 6 7.66499 1.00000 7450G 1265G 6184G 16.99 0.63 7 9.75499 1.00000 7450G 2497G 4952G 33.52 1.24 8 9.32999 1.00000 7450G 2495G 4954G 33.49 1.24 TOTAL 55872G 15071G 40801G 26.97 MIN/MAX VAR: 0.61/1.70 STDDEV: 13.16
As you can see, now osd2 also went down to 45% Use and „lost“ data. But I also think this is no problem and ceph just clears everything up after backfilling.
Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:
Looking at ``ceph -s`` you originally provided, all OSDs are up.
osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
But looking at ``pg query``, OSD.0 / 1 are not up. Are they something
That's not perfectly correct.
OSD.0/1/2 seem to be down.
like related to ?:
Ceph1, ceph2 and ceph3 are vms on one physical host
Are those OSDs running on vm instances?
# 9.7 <snip>
"state": "active+remapped", "snap_trimq": "[]", "epoch": 3114, "up": [ 7, 3 ], "acting": [ 7, 3, 0 ],
<snip>
# 7.84 <snip>
"state": "active+remapped", "snap_trimq": "[]", "epoch": 3114, "up": [ 4, 8 ], "acting": [ 4, 8, 1 ],
<snip>
# 8.1b <snip>
"state": "active+remapped", "snap_trimq": "[]", "epoch": 3114, "up": [ 4, 7 ], "acting": [ 4, 7, 2 ],
<snip>
# 7.7a <snip>
"state": "active+remapped", "snap_trimq": "[]", "epoch": 3114, "up": [ 7, 4 ], "acting": [ 7, 4, 2 ],
<snip>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|