Re: PGs stuck active+remapped and osds lose data?!

Brad Hubbard <bhubbard@xxxxxxxxxx> · Wed, 11 Jan 2017 21:00:40 +1000

Your current problem has nothing to do with clients and neither does
choose_total_tries.

Try setting just this value to 100 and see if your situation improves.

Ultimately you need to take a good look at your cluster configuration
and how your crush map is configured to deal with that configuration
but start with choose_total_tries as it has the highest probability of
helping your situation. Your clients should not be affected.

Could you explain the reasoning behind having three hosts with one ods
each, one host with two osds and one with four?

You likely need to tweak your crushmap to handle this configuration
better or, preferably, move to a more uniform configuration.

On Wed, Jan 11, 2017 at 5:38 PM, Marcus Müller <mueller.marcus@xxxxxxxxx> wrote:
> I have to thank you all. You give free support and this already helps me.
> I’m not the one who knows ceph that good, but everyday it’s getting better
> and better ;-)
>
> According to the article Brad posted I have to change the ceph osd crush
> tunables. But there are two questions left as I already wrote:
>
> - According to
> http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables there
> are a few profiles. My needed profile would be BOBTAIL (CRUSH_TUNABLES2)
> wich would set choose_total_tries to 50. For the beginning better than 19.
> There I also see: "You can select a profile on a running cluster with the
> command: ceph osd crush tunables {PROFILE}“. My question on this is: Even if
> I run hammer, is it good and possible to set it to bobtail?
>
> - We can also read:
>   WHICH CLIENT VERSIONS SUPPORT CRUSH_TUNABLES2
>   - v0.55 or later, including bobtail series (v0.56.x)
>   - Linux kernel version v3.9 or later (for the file system and RBD kernel
> clients)
>
> And here my question is: If my clients use librados (version hammer), do I
> need to have this required kernel version on the clients or the ceph nodes?
>
> I don’t want to have troubles at the end with my clients. Can someone answer
> me this, before I change the settings?
>
>
> Am 11.01.2017 um 06:47 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:
>
>
> Yeah, Sam is correct. I've not looked at crushmap. But I should have
> noticed what troublesome is with looking at `ceph osd tree`. That's my
> bad, sorry for that.
>
> Again please refer to:
>
> http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/
>
> Regards,
>
>
> On Wed, Jan 11, 2017 at 1:50 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>
> Shinobu isn't correct, you have 9/9 osds up and running.  up does not
> equal acting because crush is having trouble fulfilling the weights in
> your crushmap and the acting set is being padded out with an extra osd
> which happens to have the data to keep you up to the right number of
> replicas.  Please refer back to Brad's post.
> -Sam
>
> On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller <mueller.marcus@xxxxxxxxx>
> wrote:
>
> Ok, i understand but how can I debug why they are not running as they
> should? For me I thought everything is fine because ceph -s said they are up
> and running.
>
> I would think of a problem with the crush map.
>
> Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:
>
> e.g.,
> OSD7 / 3 / 0 are in the same acting set. They should be up, if they
> are properly running.
>
> # 9.7
> <snip>
>
>  "up": [
>      7,
>      3
>  ],
>  "acting": [
>      7,
>      3,
>      0
>  ],
>
> <snip>
>
> Here is an example:
>
> "up": [
>   1,
>   0,
>   2
> ],
> "acting": [
>   1,
>   0,
>   2
>  ],
>
> Regards,
>
>
> On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller <mueller.marcus@xxxxxxxxx>
> wrote:
>
>
> That's not perfectly correct.
>
> OSD.0/1/2 seem to be down.
>
>
>
> Sorry but where do you see this? I think this indicates that they are up:
> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?
>
>
> Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:
>
> On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller <mueller.marcus@xxxxxxxxx>
> wrote:
>
> All osds are currently up:
>
>   health HEALTH_WARN
>          4 pgs stuck unclean
>          recovery 4482/58798254 objects degraded (0.008%)
>          recovery 420522/58798254 objects misplaced (0.715%)
>          noscrub,nodeep-scrub flag(s) set
>   monmap e9: 5 mons at
> {ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
>          election epoch 478, quorum 0,1,2,3,4
> ceph1,ceph2,ceph3,ceph4,ceph5
>   osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>          flags noscrub,nodeep-scrub
>    pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
>          15070 GB used, 40801 GB / 55872 GB avail
>          4482/58798254 objects degraded (0.008%)
>          420522/58798254 objects misplaced (0.715%)
>               316 active+clean
>                 4 active+remapped
> client io 56601 B/s rd, 45619 B/s wr, 0 op/s
>
> This did not chance for two days or so.
>
>
> By the way, my ceph osd df now looks like this:
>
> ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR
> 0 1.28899  1.00000  3724G  1699G  2024G 45.63 1.69
> 1 1.57899  1.00000  3724G  1708G  2015G 45.87 1.70
> 2 1.68900  1.00000  3724G  1695G  2028G 45.54 1.69
> 3 6.78499  1.00000  7450G  1241G  6208G 16.67 0.62
> 4 8.39999  1.00000  7450G  1228G  6221G 16.49 0.61
> 5 9.51500  1.00000  7450G  1239G  6210G 16.64 0.62
> 6 7.66499  1.00000  7450G  1265G  6184G 16.99 0.63
> 7 9.75499  1.00000  7450G  2497G  4952G 33.52 1.24
> 8 9.32999  1.00000  7450G  2495G  4954G 33.49 1.24
>            TOTAL 55872G 15071G 40801G 26.97
> MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16
>
> As you can see, now osd2 also went down to 45% Use and „lost“ data. But I
> also think this is no problem and ceph just clears everything up after
> backfilling.
>
>
> Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:
>
> Looking at ``ceph -s`` you originally provided, all OSDs are up.
>
> osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
>
>
> But looking at ``pg query``, OSD.0 / 1 are not up. Are they something
>
>
> That's not perfectly correct.
>
> OSD.0/1/2 seem to be down.
>
> like related to ?:
>
> Ceph1, ceph2 and ceph3 are vms on one physical host
>
>
> Are those OSDs running on vm instances?
>
> # 9.7
> <snip>
>
> "state": "active+remapped",
> "snap_trimq": "[]",
> "epoch": 3114,
> "up": [
>    7,
>    3
> ],
> "acting": [
>    7,
>    3,
>    0
> ],
>
> <snip>
>
> # 7.84
> <snip>
>
> "state": "active+remapped",
> "snap_trimq": "[]",
> "epoch": 3114,
> "up": [
>    4,
>    8
> ],
> "acting": [
>    4,
>    8,
>    1
> ],
>
> <snip>
>
> # 8.1b
> <snip>
>
> "state": "active+remapped",
> "snap_trimq": "[]",
> "epoch": 3114,
> "up": [
>    4,
>    7
> ],
> "acting": [
>    4,
>    7,
>    2
> ],
>
> <snip>
>
> # 7.7a
> <snip>
>
> "state": "active+remapped",
> "snap_trimq": "[]",
> "epoch": 3114,
> "up": [
>    7,
>    4
> ],
> "acting": [
>    7,
>    4,
>    2
> ],
>
> <snip>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com