Re: two osd stack on peereng after start osd to recovery

Dominik Mostowiec <dominikmostowiec@xxxxxxxxx> · Tue, 2 Jul 2013 11:37:15 +0200

Hi,
I got it.

ceph health details
HEALTH_WARN 3 pgs peering; 3 pgs stuck inactive; 5 pgs stuck unclean;
recovery 64/38277874 degraded (0.000%)
pg 5.df9 is stuck inactive for 138669.746512, current state peering,
last acting [87,2,151]
pg 5.a82 is stuck inactive for 138638.121867, current state peering,
last acting [151,87,42]
pg 5.80d is stuck inactive for 138621.069523, current state peering,
last acting [151,47,87]
pg 5.df9 is stuck unclean for 138669.746761, current state peering,
last acting [87,2,151]
pg 5.ae2 is stuck unclean for 139479.810499, current state active,
last acting [87,151,28]
pg 5.7b6 is stuck unclean for 139479.693271, current state active,
last acting [87,105,2]
pg 5.a82 is stuck unclean for 139479.713859, current state peering,
last acting [151,87,42]
pg 5.80d is stuck unclean for 139479.800820, current state peering,
last acting [151,47,87]
pg 5.df9 is peering, acting [87,2,151]
pg 5.a82 is peering, acting [151,87,42]
pg 5.80d is peering, acting [151,47,87]
recovery 64/38277874 degraded (0.000%)

osd pg query for 5.df9:
{ "state": "peering",
  "up": [
        87,
        2,
        151],
  "acting": [
        87,
        2,
        151],
  "info": { "pgid": "5.df9",
      "last_update": "119454'58844953",
      "last_complete": "119454'58844953",
      "log_tail": "119454'58843952",
      "last_backfill": "MAX",
      "purged_snaps": "[]",
      "history": { "epoch_created": 365,
          "last_epoch_started": 119456,
          "last_epoch_clean": 119456,
          "last_epoch_split": 117806,
          "same_up_since": 119458,
          "same_interval_since": 119458,
          "same_primary_since": 119458,
          "last_scrub": "119442'58732630",
          "last_scrub_stamp": "2013-06-29 20:02:24.817352",
          "last_deep_scrub": "119271'57224023",
          "last_deep_scrub_stamp": "2013-06-23 02:04:49.654373",
          "last_clean_scrub_stamp": "2013-06-29 20:02:24.817352"},
      "stats": { "version": "119454'58844953",
          "reported": "119458'42382189",
          "state": "peering",
          "last_fresh": "2013-06-30 20:35:29.489826",
          "last_change": "2013-06-30 20:35:28.469854",
          "last_active": "2013-06-30 20:33:24.126599",
          "last_clean": "2013-06-30 20:33:24.126599",
          "last_unstale": "2013-06-30 20:35:29.489826",
          "mapping_epoch": 119455,
          "log_start": "119454'58843952",
          "ondisk_log_start": "119454'58843952",
          "created": 365,
          "last_epoch_clean": 365,
          "parent": "0.0",
          "parent_split_bits": 0,
          "last_scrub": "119442'58732630",
          "last_scrub_stamp": "2013-06-29 20:02:24.817352",
          "last_deep_scrub": "119271'57224023",
          "last_deep_scrub_stamp": "2013-06-23 02:04:49.654373",
          "last_clean_scrub_stamp": "2013-06-29 20:02:24.817352",
          "log_size": 135341,
          "ondisk_log_size": 135341,
          "stats_invalid": "0",
          "stat_sum": { "num_bytes": 1010563373,
              "num_objects": 3099,
              "num_object_clones": 0,
              "num_object_copies": 0,
              "num_objects_missing_on_primary": 0,
              "num_objects_degraded": 0,
              "num_objects_unfound": 0,
              "num_read": 302,
              "num_read_kb": 0,
              "num_write": 32264,
              "num_write_kb": 798650,
              "num_scrub_errors": 0,
              "num_objects_recovered": 8235,
              "num_bytes_recovered": 2085653757,
              "num_keys_recovered": 249061471},
          "stat_cat_sum": {},
          "up": [
                87,
                2,
                151],
          "acting": [
                87,
                2,
                151]},
      "empty": 0,
      "dne": 0,
      "incomplete": 0,
      "last_epoch_started": 119454},
  "recovery_state": [
        { "name": "Started\/Primary\/Peering\/GetLog",
          "enter_time": "2013-06-30 20:35:28.545478",
          "newest_update_osd": 2},
        { "name": "Started\/Primary\/Peering",
          "enter_time": "2013-06-30 20:35:28.469841",
          "past_intervals": [
                { "first": 119453,
                  "last": 119454,
                  "maybe_went_rw": 1,
                  "up": [
                        87,
                        2,
                        151],
                  "acting": [
                        87,
                        2,
                        151]},
                { "first": 119455,
                  "last": 119457,
                  "maybe_went_rw": 1,
                  "up": [
                        2,
                        151],
                  "acting": [
                        2,
                        151]}],
          "probing_osds": [
                2,
                87,
                151],
          "down_osds_we_would_probe": [],
          "peering_blocked_by": []},
        { "name": "Started",
          "enter_time": "2013-06-30 20:35:28.469765"}]}

For other PGs: https://www.dropbox.com/s/q5iv8lwzecioy3d/pg_query.tar.tz

--
Regards
Dominik

2013/6/30 Andrey Korolyov <andrey@xxxxxxx>:
> That`s not a loop as it looks, sorry  - I had reproduced issue many
> times and there is no such cpu-eating behavior in most cases, only
> locked pgs are presented. Also I may celebrate returning of 'wrong
> down mark' bug, at least for the 0.61.4 tag. For first one, I`ll send
> a link with core as quick as I will be able to reproduce it on my test
> env, and second one linked with 100% disk utilization, so I`m not sure
> if this is right behavior or wrong.
>
> On Sat, Jun 29, 2013 at 1:28 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>> On Sat, 29 Jun 2013, Andrey Korolyov wrote:
>>> There is almost same problem with the 0.61 cluster, at least with same
>>> symptoms. Could be reproduced quite easily - remove an osd and then
>>> mark it as out and with quite high probability one of neighbors will
>>> be stuck at the end of peering process with couple of peering pgs with
>>> primary copy on it. Such osd process seems to be stuck in some kind of
>>> lock, eating exactly 100% of one core.
>>
>> Which version?
>> Can you attach with gdb and get a backtrace to see what it is chewing on?
>>
>> Thanks!
>> sage
>>
>>
>>>
>>> On Thu, Jun 13, 2013 at 8:42 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>> > On Thu, Jun 13, 2013 at 6:33 AM, S?awomir Skowron <szibis@xxxxxxxxx> wrote:
>>> >> Hi, sorry for late response.
>>> >>
>>> >> https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view
>>> >>
>>> >> Logs in attachment, and on google drive, from today.
>>> >>
>>> >> https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view
>>> >>
>>> >> We have such problem today. And new logs are on google drive with today date.
>>> >>
>>> >> Strange is that problematic osd.71 have about 10-15%, more space used
>>> >> then other osd in cluster.
>>> >>
>>> >> Today in one hour osd.71 fails 3 times in mon log, and after third
>>> >> recovery has been stuck, and many 500 errors appears in http layer on
>>> >> top of rgw. When it's stuck, restarting osd71, osd.23, and osd.108,
>>> >> all from stucked pg, helps, but i run even repair on this osd, just in
>>> >> case.
>>> >>
>>> >> I have some theory, that on this pg is rgw index of objects, or one of
>>> >> osd in this pg, have some problems with local filesystem or drive
>>> >> bellow (raid controller reports nothing about that), but i do not see
>>> >> any problem in system.
>>> >>
>>> >> How can we find in which pg/osd index of objects in rgw bucket exist ??
>>> >
>>> > You can find the location of any named object by grabbing the OSD map
>>> > from the cluster and using the osdmaptool: "osdmaptool <mapfile>
>>> > --test-map-object <objname> --pool <poolid>".
>>> >
>>> > You're not providing any context for your issue though, so we really
>>> > can't help. What symptoms are you observing?
>>> > -Greg
>>> > Software Engineer #42 @ http://inktank.com | http://ceph.com
>>> > _______________________________________________
>>> > ceph-users mailing list
>>> > ceph-users@xxxxxxxxxxxxxx
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Pozdrawiam
Dominik
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com