Re: two osd stack on peereng after start osd to recovery

Dominik Mostowiec <dominikmostowiec@xxxxxxxxx> · Tue, 16 Jul 2013 13:21:10 +0200



Hi,
I noticed that problem is more frequent at nigth where traffic is smaller.
Mabye it is caused by scrubbing (multiple scrubbings on one osd) and
to small "filestore op threads" or something other num threads
settings in my config:
        osd heartbeat grace = 15
        filestore flush min = 0
        filestore flusher = false
        filestore fiemap = false
        filestore op threads = 4
        filestore queue max ops = 4096
        filestore queue max bytes = 10485760
        filestore queue committing max bytes = 10485760
        osd op threads = 8
        osd disk threads = 4
        osd recovery threads = 1
        osd recovery max active = 1
        osd recovery op priority = 1
        osd client op priority = 100
        osd max backfills = 1
--
Regards
Dominik

2013/7/4 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>:
> I reported bug: http://tracker.ceph.com/issues/5504
>
> --
> Regards
> Dominik
>
> 2013/7/2 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>:
>> Hi,
>> Some osd.87 performance graphs:
>> https://www.dropbox.com/s/o07wae2041hu06l/osd_87_performance.PNG
>> After 11.05 I have restarted it.
>>
>> Mons .., maybe this is the problem.
>>
>> --
>> Regards
>> Dominik
>>
>> 2013/7/2 Andrey Korolyov <andrey@xxxxxxx>:
>>> Hi Dominik,
>>>
>>> What`s about performance on the osd.87 at this moment, do you have any
>>> related measurements?
>>>
>>> As for mine version of this issue, seems that quorum has some kind of
>>> degradation over time - when I restarted mons, problem has gone and peering
>>> time lowered by factor of ten or so.  Also seems that the problem has a
>>> cumulative origin in the quorum - I did disk replacement over last week and
>>> every time peering gets worse and worse.  I assume that it`s a time to put
>>> more or less formalized problems to the bugtracker:
>>> - such degradation over a time plus stuck placement groups,
>>> - newer kind of problem related to the epochs too - restarting one mon
>>> resulting to slight dataplacement change at the moment when _first rebooted_
>>> monitor came up, not shown up with one hour delays between quorum restart.
>>>
>>>
>>>
>>>
>>> On Tue, Jul 2, 2013 at 1:37 PM, Dominik Mostowiec
>>> <dominikmostowiec@xxxxxxxxx> wrote:
>>>>
>>>> Hi,
>>>> I got it.
>>>>
>>>> ceph health details
>>>> HEALTH_WARN 3 pgs peering; 3 pgs stuck inactive; 5 pgs stuck unclean;
>>>> recovery 64/38277874 degraded (0.000%)
>>>> pg 5.df9 is stuck inactive for 138669.746512, current state peering,
>>>> last acting [87,2,151]
>>>> pg 5.a82 is stuck inactive for 138638.121867, current state peering,
>>>> last acting [151,87,42]
>>>> pg 5.80d is stuck inactive for 138621.069523, current state peering,
>>>> last acting [151,47,87]
>>>> pg 5.df9 is stuck unclean for 138669.746761, current state peering,
>>>> last acting [87,2,151]
>>>> pg 5.ae2 is stuck unclean for 139479.810499, current state active,
>>>> last acting [87,151,28]
>>>> pg 5.7b6 is stuck unclean for 139479.693271, current state active,
>>>> last acting [87,105,2]
>>>> pg 5.a82 is stuck unclean for 139479.713859, current state peering,
>>>> last acting [151,87,42]
>>>> pg 5.80d is stuck unclean for 139479.800820, current state peering,
>>>> last acting [151,47,87]
>>>> pg 5.df9 is peering, acting [87,2,151]
>>>> pg 5.a82 is peering, acting [151,87,42]
>>>> pg 5.80d is peering, acting [151,47,87]
>>>> recovery 64/38277874 degraded (0.000%)
>>>>
>>>>
>>>> osd pg query for 5.df9:
>>>> { "state": "peering",
>>>>   "up": [
>>>>         87,
>>>>         2,
>>>>         151],
>>>>   "acting": [
>>>>         87,
>>>>         2,
>>>>         151],
>>>>   "info": { "pgid": "5.df9",
>>>>       "last_update": "119454'58844953",
>>>>       "last_complete": "119454'58844953",
>>>>       "log_tail": "119454'58843952",
>>>>       "last_backfill": "MAX",
>>>>       "purged_snaps": "[]",
>>>>       "history": { "epoch_created": 365,
>>>>           "last_epoch_started": 119456,
>>>>           "last_epoch_clean": 119456,
>>>>           "last_epoch_split": 117806,
>>>>           "same_up_since": 119458,
>>>>           "same_interval_since": 119458,
>>>>           "same_primary_since": 119458,
>>>>           "last_scrub": "119442'58732630",
>>>>           "last_scrub_stamp": "2013-06-29 20:02:24.817352",
>>>>           "last_deep_scrub": "119271'57224023",
>>>>           "last_deep_scrub_stamp": "2013-06-23 02:04:49.654373",
>>>>           "last_clean_scrub_stamp": "2013-06-29 20:02:24.817352"},
>>>>       "stats": { "version": "119454'58844953",
>>>>           "reported": "119458'42382189",
>>>>           "state": "peering",
>>>>           "last_fresh": "2013-06-30 20:35:29.489826",
>>>>           "last_change": "2013-06-30 20:35:28.469854",
>>>>           "last_active": "2013-06-30 20:33:24.126599",
>>>>           "last_clean": "2013-06-30 20:33:24.126599",
>>>>           "last_unstale": "2013-06-30 20:35:29.489826",
>>>>           "mapping_epoch": 119455,
>>>>           "log_start": "119454'58843952",
>>>>           "ondisk_log_start": "119454'58843952",
>>>>           "created": 365,
>>>>           "last_epoch_clean": 365,
>>>>           "parent": "0.0",
>>>>           "parent_split_bits": 0,
>>>>           "last_scrub": "119442'58732630",
>>>>           "last_scrub_stamp": "2013-06-29 20:02:24.817352",
>>>>           "last_deep_scrub": "119271'57224023",
>>>>           "last_deep_scrub_stamp": "2013-06-23 02:04:49.654373",
>>>>           "last_clean_scrub_stamp": "2013-06-29 20:02:24.817352",
>>>>           "log_size": 135341,
>>>>           "ondisk_log_size": 135341,
>>>>           "stats_invalid": "0",
>>>>           "stat_sum": { "num_bytes": 1010563373,
>>>>               "num_objects": 3099,
>>>>               "num_object_clones": 0,
>>>>               "num_object_copies": 0,
>>>>               "num_objects_missing_on_primary": 0,
>>>>               "num_objects_degraded": 0,
>>>>               "num_objects_unfound": 0,
>>>>               "num_read": 302,
>>>>               "num_read_kb": 0,
>>>>               "num_write": 32264,
>>>>               "num_write_kb": 798650,
>>>>               "num_scrub_errors": 0,
>>>>               "num_objects_recovered": 8235,
>>>>               "num_bytes_recovered": 2085653757,
>>>>               "num_keys_recovered": 249061471},
>>>>           "stat_cat_sum": {},
>>>>           "up": [
>>>>                 87,
>>>>                 2,
>>>>                 151],
>>>>           "acting": [
>>>>                 87,
>>>>                 2,
>>>>                 151]},
>>>>       "empty": 0,
>>>>       "dne": 0,
>>>>       "incomplete": 0,
>>>>       "last_epoch_started": 119454},
>>>>   "recovery_state": [
>>>>         { "name": "Started\/Primary\/Peering\/GetLog",
>>>>           "enter_time": "2013-06-30 20:35:28.545478",
>>>>           "newest_update_osd": 2},
>>>>         { "name": "Started\/Primary\/Peering",
>>>>           "enter_time": "2013-06-30 20:35:28.469841",
>>>>           "past_intervals": [
>>>>                 { "first": 119453,
>>>>                   "last": 119454,
>>>>                   "maybe_went_rw": 1,
>>>>                   "up": [
>>>>                         87,
>>>>                         2,
>>>>                         151],
>>>>                   "acting": [
>>>>                         87,
>>>>                         2,
>>>>                         151]},
>>>>                 { "first": 119455,
>>>>                   "last": 119457,
>>>>                   "maybe_went_rw": 1,
>>>>                   "up": [
>>>>                         2,
>>>>                         151],
>>>>                   "acting": [
>>>>                         2,
>>>>                         151]}],
>>>>           "probing_osds": [
>>>>                 2,
>>>>                 87,
>>>>                 151],
>>>>           "down_osds_we_would_probe": [],
>>>>           "peering_blocked_by": []},
>>>>         { "name": "Started",
>>>>           "enter_time": "2013-06-30 20:35:28.469765"}]}
>>>>
>>>>
>>>> For other PGs: https://www.dropbox.com/s/q5iv8lwzecioy3d/pg_query.tar.tz
>>>>
>>>> --
>>>> Regards
>>>> Dominik
>>>>
>>>> 2013/6/30 Andrey Korolyov <andrey@xxxxxxx>:
>>>> > That`s not a loop as it looks, sorry  - I had reproduced issue many
>>>> > times and there is no such cpu-eating behavior in most cases, only
>>>> > locked pgs are presented. Also I may celebrate returning of 'wrong
>>>> > down mark' bug, at least for the 0.61.4 tag. For first one, I`ll send
>>>> > a link with core as quick as I will be able to reproduce it on my test
>>>> > env, and second one linked with 100% disk utilization, so I`m not sure
>>>> > if this is right behavior or wrong.
>>>> >
>>>> > On Sat, Jun 29, 2013 at 1:28 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>>> >> On Sat, 29 Jun 2013, Andrey Korolyov wrote:
>>>> >>> There is almost same problem with the 0.61 cluster, at least with same
>>>> >>> symptoms. Could be reproduced quite easily - remove an osd and then
>>>> >>> mark it as out and with quite high probability one of neighbors will
>>>> >>> be stuck at the end of peering process with couple of peering pgs with
>>>> >>> primary copy on it. Such osd process seems to be stuck in some kind of
>>>> >>> lock, eating exactly 100% of one core.
>>>> >>
>>>> >> Which version?
>>>> >> Can you attach with gdb and get a backtrace to see what it is chewing
>>>> >> on?
>>>> >>
>>>> >> Thanks!
>>>> >> sage
>>>> >>
>>>> >>
>>>> >>>
>>>> >>> On Thu, Jun 13, 2013 at 8:42 PM, Gregory Farnum <greg@xxxxxxxxxxx>
>>>> >>> wrote:
>>>> >>> > On Thu, Jun 13, 2013 at 6:33 AM, S?awomir Skowron <szibis@xxxxxxxxx>
>>>> >>> > wrote:
>>>> >>> >> Hi, sorry for late response.
>>>> >>> >>
>>>> >>> >> https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view
>>>> >>> >>
>>>> >>> >> Logs in attachment, and on google drive, from today.
>>>> >>> >>
>>>> >>> >> https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view
>>>> >>> >>
>>>> >>> >> We have such problem today. And new logs are on google drive with
>>>> >>> >> today date.
>>>> >>> >>
>>>> >>> >> Strange is that problematic osd.71 have about 10-15%, more space
>>>> >>> >> used
>>>> >>> >> then other osd in cluster.
>>>> >>> >>
>>>> >>> >> Today in one hour osd.71 fails 3 times in mon log, and after third
>>>> >>> >> recovery has been stuck, and many 500 errors appears in http layer
>>>> >>> >> on
>>>> >>> >> top of rgw. When it's stuck, restarting osd71, osd.23, and osd.108,
>>>> >>> >> all from stucked pg, helps, but i run even repair on this osd, just
>>>> >>> >> in
>>>> >>> >> case.
>>>> >>> >>
>>>> >>> >> I have some theory, that on this pg is rgw index of objects, or one
>>>> >>> >> of
>>>> >>> >> osd in this pg, have some problems with local filesystem or drive
>>>> >>> >> bellow (raid controller reports nothing about that), but i do not
>>>> >>> >> see
>>>> >>> >> any problem in system.
>>>> >>> >>
>>>> >>> >> How can we find in which pg/osd index of objects in rgw bucket
>>>> >>> >> exist ??
>>>> >>> >
>>>> >>> > You can find the location of any named object by grabbing the OSD
>>>> >>> > map
>>>> >>> > from the cluster and using the osdmaptool: "osdmaptool <mapfile>
>>>> >>> > --test-map-object <objname> --pool <poolid>".
>>>> >>> >
>>>> >>> > You're not providing any context for your issue though, so we really
>>>> >>> > can't help. What symptoms are you observing?
>>>> >>> > -Greg
>>>> >>> > Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>> >>> > _______________________________________________
>>>> >>> > ceph-users mailing list
>>>> >>> > ceph-users@xxxxxxxxxxxxxx
>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >>> _______________________________________________
>>>> >>> ceph-users mailing list
>>>> >>> ceph-users@xxxxxxxxxxxxxx
>>>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >>>
>>>> > _______________________________________________
>>>> > ceph-users mailing list
>>>> > ceph-users@xxxxxxxxxxxxxx
>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>> --
>>>> Pozdrawiam
>>>> Dominik
>>>
>>>
>>
>>
>>
>> --
>> Pozdrawiam
>> Dominik
>
>
>
> --
> Pozdrawiam
> Dominik


-- 
Pozdrawiam
Dominik
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com