Re: PGs stuck active+remapped and osds lose data?!

Marcus Müller <mueller.marcus@xxxxxxxxx> · Wed, 11 Jan 2017 08:38:23 +0100

I have to thank you all. You give free support and this already helps me. I’m not the one who knows ceph that good, but everyday it’s getting better and better ;-)
According to the article Brad posted I have to change the ceph osd crush tunables. But there are two questions left as I already wrote:

- According to http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables there are a few profiles. My needed profile would be BOBTAIL (CRUSH_TUNABLES2) wich would set choose_total_tries to 50. For the beginning better than 19. There I also see: "You can select a profile on a running cluster with the command: ceph osd crush tunables {PROFILE}“. My question on this is: Even if I run hammer, is it good and possible to set it to bobtail?

- We can also read: 
  WHICH CLIENT VERSIONS SUPPORT CRUSH_TUNABLES2 
  - v0.55 or later, including bobtail series (v0.56.x) 
  - Linux kernel version v3.9 or later (for the file system and RBD kernel clients)

And here my question is: If my clients use librados (version hammer), do I need to have this required kernel version on the clients or the ceph nodes? 

I don’t want to have troubles at the end with my clients. Can someone answer me this, before I change the settings?

Am 11.01.2017 um 06:47 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:

Yeah, Sam is correct. I've not looked at crushmap. But I should have
noticed what troublesome is with looking at `ceph osd tree`. That's my
bad, sorry for that.

Again please refer to:

http://www.anchor.com.au/blog/2013/02/pulling-apart-cephs-crush-algorithm/

Regards,

On Wed, Jan 11, 2017 at 1:50 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
Shinobu isn't correct, you have 9/9 osds up and running.  up does not
equal acting because crush is having trouble fulfilling the weights in
your crushmap and the acting set is being padded out with an extra osd
which happens to have the data to keep you up to the right number of
replicas.  Please refer back to Brad's post.
-Sam

On Mon, Jan 9, 2017 at 11:08 PM, Marcus Müller <mueller.marcus@xxxxxxxxx> wrote:
Ok, i understand but how can I debug why they are not running as they should? For me I thought everything is fine because ceph -s said they are up and running.

I would think of a problem with the crush map.

Am 10.01.2017 um 08:06 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:

e.g.,
OSD7 / 3 / 0 are in the same acting set. They should be up, if they
are properly running.

# 9.7
<snip>
  "up": [
      7,
      3
  ],
  "acting": [
      7,
      3,
      0
  ],
<snip>

Here is an example:

 "up": [
   1,
   0,
   2
 ],
 "acting": [
   1,
   0,
   2
  ],

Regards,

On Tue, Jan 10, 2017 at 3:52 PM, Marcus Müller <mueller.marcus@xxxxxxxxx> wrote:

That's not perfectly correct.

OSD.0/1/2 seem to be down.

Sorry but where do you see this? I think this indicates that they are up:   osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs?

Am 10.01.2017 um 07:50 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:

On Tue, Jan 10, 2017 at 3:44 PM, Marcus Müller <mueller.marcus@xxxxxxxxx> wrote:
All osds are currently up:

   health HEALTH_WARN
          4 pgs stuck unclean
          recovery 4482/58798254 objects degraded (0.008%)
          recovery 420522/58798254 objects misplaced (0.715%)
          noscrub,nodeep-scrub flag(s) set
   monmap e9: 5 mons at
{ceph1=192.168.10.3:6789/0,ceph2=192.168.10.4:6789/0,ceph3=192.168.10.5:6789/0,ceph4=192.168.60.6:6789/0,ceph5=192.168.60.11:6789/0}
          election epoch 478, quorum 0,1,2,3,4
ceph1,ceph2,ceph3,ceph4,ceph5
   osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs
          flags noscrub,nodeep-scrub
    pgmap v9981077: 320 pgs, 3 pools, 4837 GB data, 19140 kobjects
          15070 GB used, 40801 GB / 55872 GB avail
          4482/58798254 objects degraded (0.008%)
          420522/58798254 objects misplaced (0.715%)
               316 active+clean
                 4 active+remapped
client io 56601 B/s rd, 45619 B/s wr, 0 op/s

This did not chance for two days or so.

By the way, my ceph osd df now looks like this:

ID WEIGHT  REWEIGHT SIZE   USE    AVAIL  %USE  VAR
0 1.28899  1.00000  3724G  1699G  2024G 45.63 1.69
1 1.57899  1.00000  3724G  1708G  2015G 45.87 1.70
2 1.68900  1.00000  3724G  1695G  2028G 45.54 1.69
3 6.78499  1.00000  7450G  1241G  6208G 16.67 0.62
4 8.39999  1.00000  7450G  1228G  6221G 16.49 0.61
5 9.51500  1.00000  7450G  1239G  6210G 16.64 0.62
6 7.66499  1.00000  7450G  1265G  6184G 16.99 0.63
7 9.75499  1.00000  7450G  2497G  4952G 33.52 1.24
8 9.32999  1.00000  7450G  2495G  4954G 33.49 1.24
            TOTAL 55872G 15071G 40801G 26.97
MIN/MAX VAR: 0.61/1.70  STDDEV: 13.16

As you can see, now osd2 also went down to 45% Use and „lost“ data. But I
also think this is no problem and ceph just clears everything up after
backfilling.

Am 10.01.2017 um 07:29 schrieb Shinobu Kinjo <skinjo@xxxxxxxxxx>:

Looking at ``ceph -s`` you originally provided, all OSDs are up.

osdmap e3114: 9 osds: 9 up, 9 in; 4 remapped pgs

But looking at ``pg query``, OSD.0 / 1 are not up. Are they something

That's not perfectly correct.

OSD.0/1/2 seem to be down.

like related to ?:

Ceph1, ceph2 and ceph3 are vms on one physical host

Are those OSDs running on vm instances?

# 9.7
<snip>

"state": "active+remapped",
"snap_trimq": "[]",
"epoch": 3114,
"up": [
    7,
    3
],
"acting": [
    7,
    3,
    0
],

<snip>

# 7.84
<snip>

"state": "active+remapped",
"snap_trimq": "[]",
"epoch": 3114,
"up": [
    4,
    8
],
"acting": [
    4,
    8,
    1
],

<snip>

# 8.1b
<snip>

"state": "active+remapped",
"snap_trimq": "[]",
"epoch": 3114,
"up": [
    4,
    7
],
"acting": [
    4,
    7,
    2
],

<snip>

# 7.7a
<snip>

"state": "active+remapped",
"snap_trimq": "[]",
"epoch": 3114,
"up": [
    7,
    4
],
"acting": [
    7,
    4,
    2
],

<snip>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com