Re: two osd stack on peereng after start osd to recovery

Andrey Korolyov <andrey@xxxxxxx> · Tue, 2 Jul 2013 13:53:13 +0400

Hi Dominik,

What`s about performance on the osd.87 at this moment, do you have any related measurements?

As for mine version of this issue, seems that quorum has some kind of degradation over time - when I restarted mons, problem has gone and peering time lowered by factor of ten or so.  Also seems that the problem has a cumulative origin in the quorum - I did disk replacement over last week and every time peering gets worse and worse.  I assume that it`s a time to put more or less formalized problems to the bugtracker:

- such degradation over a time plus stuck placement groups,
- newer kind of problem related to the epochs too - restarting one mon resulting to slight dataplacement change at the moment when _first rebooted_ monitor came up, not shown up with one hour delays between quorum restart.

On Tue, Jul 2, 2013 at 1:37 PM, Dominik Mostowiec <dominikmostowiec@xxxxxxxxx> wrote:

Hi,

I got it.

ceph health details

HEALTH_WARN 3 pgs peering; 3 pgs stuck inactive; 5 pgs stuck unclean;

recovery 64/38277874 degraded (0.000%)

pg 5.df9 is stuck inactive for 138669.746512, current state peering,

last acting [87,2,151]

pg 5.a82 is stuck inactive for 138638.121867, current state peering,

last acting [151,87,42]

pg 5.80d is stuck inactive for 138621.069523, current state peering,

last acting [151,47,87]

pg 5.df9 is stuck unclean for 138669.746761, current state peering,

last acting [87,2,151]

pg 5.ae2 is stuck unclean for 139479.810499, current state active,

last acting [87,151,28]

pg 5.7b6 is stuck unclean for 139479.693271, current state active,

last acting [87,105,2]

pg 5.a82 is stuck unclean for 139479.713859, current state peering,

last acting [151,87,42]

pg 5.80d is stuck unclean for 139479.800820, current state peering,

last acting [151,47,87]

pg 5.df9 is peering, acting [87,2,151]

pg 5.a82 is peering, acting [151,87,42]

pg 5.80d is peering, acting [151,47,87]

recovery 64/38277874 degraded (0.000%)

osd pg query for 5.df9:

{ "state": "peering",

  "up": [

        87,

        2,

        151],

  "acting": [

        87,

        2,

        151],

  "info": { "pgid": "5.df9",

      "last_update": "119454'58844953",

      "last_complete": "119454'58844953",

      "log_tail": "119454'58843952",

      "last_backfill": "MAX",

      "purged_snaps": "[]",

      "history": { "epoch_created": 365,

          "last_epoch_started": 119456,

          "last_epoch_clean": 119456,

          "last_epoch_split": 117806,

          "same_up_since": 119458,

          "same_interval_since": 119458,

          "same_primary_since": 119458,

          "last_scrub": "119442'58732630",

          "last_scrub_stamp": "2013-06-29 20:02:24.817352",

          "last_deep_scrub": "119271'57224023",

          "last_deep_scrub_stamp": "2013-06-23 02:04:49.654373",

          "last_clean_scrub_stamp": "2013-06-29 20:02:24.817352"},

      "stats": { "version": "119454'58844953",

          "reported": "119458'42382189",

          "state": "peering",

          "last_fresh": "2013-06-30 20:35:29.489826",

          "last_change": "2013-06-30 20:35:28.469854",

          "last_active": "2013-06-30 20:33:24.126599",

          "last_clean": "2013-06-30 20:33:24.126599",

          "last_unstale": "2013-06-30 20:35:29.489826",

          "mapping_epoch": 119455,

          "log_start": "119454'58843952",

          "ondisk_log_start": "119454'58843952",

          "created": 365,

          "last_epoch_clean": 365,

          "parent": "0.0",

          "parent_split_bits": 0,

          "last_scrub": "119442'58732630",

          "last_scrub_stamp": "2013-06-29 20:02:24.817352",

          "last_deep_scrub": "119271'57224023",

          "last_deep_scrub_stamp": "2013-06-23 02:04:49.654373",

          "last_clean_scrub_stamp": "2013-06-29 20:02:24.817352",

          "log_size": 135341,

          "ondisk_log_size": 135341,

          "stats_invalid": "0",

          "stat_sum": { "num_bytes": 1010563373,

              "num_objects": 3099,

              "num_object_clones": 0,

              "num_object_copies": 0,

              "num_objects_missing_on_primary": 0,

              "num_objects_degraded": 0,

              "num_objects_unfound": 0,

              "num_read": 302,

              "num_read_kb": 0,

              "num_write": 32264,

              "num_write_kb": 798650,

              "num_scrub_errors": 0,

              "num_objects_recovered": 8235,

              "num_bytes_recovered": 2085653757,

              "num_keys_recovered": 249061471},

          "stat_cat_sum": {},

          "up": [

                87,

                2,

                151],

          "acting": [

                87,

                2,

                151]},

      "empty": 0,

      "dne": 0,

      "incomplete": 0,

      "last_epoch_started": 119454},

  "recovery_state": [

        { "name": "Started\/Primary\/Peering\/GetLog",

          "enter_time": "2013-06-30 20:35:28.545478",

          "newest_update_osd": 2},

        { "name": "Started\/Primary\/Peering",

          "enter_time": "2013-06-30 20:35:28.469841",

          "past_intervals": [

                { "first": 119453,

                  "last": 119454,

                  "maybe_went_rw": 1,

                  "up": [

                        87,

                        2,

                        151],

                  "acting": [

                        87,

                        2,

                        151]},

                { "first": 119455,

                  "last": 119457,

                  "maybe_went_rw": 1,

                  "up": [

                        2,

                        151],

                  "acting": [

                        2,

                        151]}],

          "probing_osds": [

                2,

                87,

                151],

          "down_osds_we_would_probe": [],

          "peering_blocked_by": []},

        { "name": "Started",

          "enter_time": "2013-06-30 20:35:28.469765"}]}

For other PGs: https://www.dropbox.com/s/q5iv8lwzecioy3d/pg_query.tar.tz

--

Regards

Dominik

2013/6/30 Andrey Korolyov <andrey@xxxxxxx>:

> That`s not a loop as it looks, sorry  - I had reproduced issue many

> times and there is no such cpu-eating behavior in most cases, only

> locked pgs are presented. Also I may celebrate returning of 'wrong

> down mark' bug, at least for the 0.61.4 tag. For first one, I`ll send

> a link with core as quick as I will be able to reproduce it on my test

> env, and second one linked with 100% disk utilization, so I`m not sure

> if this is right behavior or wrong.

>

> On Sat, Jun 29, 2013 at 1:28 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:

>> On Sat, 29 Jun 2013, Andrey Korolyov wrote:

>>> There is almost same problem with the 0.61 cluster, at least with same

>>> symptoms. Could be reproduced quite easily - remove an osd and then

>>> mark it as out and with quite high probability one of neighbors will

>>> be stuck at the end of peering process with couple of peering pgs with

>>> primary copy on it. Such osd process seems to be stuck in some kind of

>>> lock, eating exactly 100% of one core.

>>

>> Which version?

>> Can you attach with gdb and get a backtrace to see what it is chewing on?

>>

>> Thanks!

>> sage

>>

>>

>>>

>>> On Thu, Jun 13, 2013 at 8:42 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

>>> > On Thu, Jun 13, 2013 at 6:33 AM, S?awomir Skowron <szibis@xxxxxxxxx> wrote:

>>> >> Hi, sorry for late response.

>>> >>

>>> >> https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view

>>> >>

>>> >> Logs in attachment, and on google drive, from today.

>>> >>

>>> >> https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view

>>> >>

>>> >> We have such problem today. And new logs are on google drive with today date.

>>> >>

>>> >> Strange is that problematic osd.71 have about 10-15%, more space used

>>> >> then other osd in cluster.

>>> >>

>>> >> Today in one hour osd.71 fails 3 times in mon log, and after third

>>> >> recovery has been stuck, and many 500 errors appears in http layer on

>>> >> top of rgw. When it's stuck, restarting osd71, osd.23, and osd.108,

>>> >> all from stucked pg, helps, but i run even repair on this osd, just in

>>> >> case.

>>> >>

>>> >> I have some theory, that on this pg is rgw index of objects, or one of

>>> >> osd in this pg, have some problems with local filesystem or drive

>>> >> bellow (raid controller reports nothing about that), but i do not see

>>> >> any problem in system.

>>> >>

>>> >> How can we find in which pg/osd index of objects in rgw bucket exist ??

>>> >

>>> > You can find the location of any named object by grabbing the OSD map

>>> > from the cluster and using the osdmaptool: "osdmaptool <mapfile>

>>> > --test-map-object <objname> --pool <poolid>".

>>> >

>>> > You're not providing any context for your issue though, so we really

>>> > can't help. What symptoms are you observing?

>>> > -Greg

>>> > Software Engineer #42 @ http://inktank.com | http://ceph.com

>>> > _______________________________________________

>>> > ceph-users mailing list

>>> > ceph-users@xxxxxxxxxxxxxx

>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>> _______________________________________________

>>> ceph-users mailing list

>>> ceph-users@xxxxxxxxxxxxxx

>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Pozdrawiam

Dominik

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com