Re: two osd stack on peereng after start osd to recovery

Dominik Mostowiec <dominikmostowiec@xxxxxxxxx> · Wed, 17 Jul 2013 22:50:09 +0200

Hi,
Something interesting, osd whith problems eats much more memory.
Standard is about 300m,
This osd eats even 30G.

Can i do any tests to help find where the problem is?

--
Regards
Dominik

2013/7/16 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>:
> Hi,
> I noticed that problem is more frequent at nigth where traffic is smaller.
> Mabye it is caused by scrubbing (multiple scrubbings on one osd) and
> to small "filestore op threads" or something other num threads
> settings in my config:
>         osd heartbeat grace = 15
>         filestore flush min = 0
>         filestore flusher = false
>         filestore fiemap = false
>         filestore op threads = 4
>         filestore queue max ops = 4096
>         filestore queue max bytes = 10485760
>         filestore queue committing max bytes = 10485760
>         osd op threads = 8
>         osd disk threads = 4
>         osd recovery threads = 1
>         osd recovery max active = 1
>         osd recovery op priority = 1
>         osd client op priority = 100
>         osd max backfills = 1
> --
> Regards
> Dominik
>
> 2013/7/4 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>:
>> I reported bug: http://tracker.ceph.com/issues/5504
>>
>> --
>> Regards
>> Dominik
>>
>> 2013/7/2 Dominik Mostowiec <dominikmostowiec@xxxxxxxxx>:
>>> Hi,
>>> Some osd.87 performance graphs:
>>> https://www.dropbox.com/s/o07wae2041hu06l/osd_87_performance.PNG
>>> After 11.05 I have restarted it.
>>>
>>> Mons .., maybe this is the problem.
>>>
>>> --
>>> Regards
>>> Dominik
>>>
>>> 2013/7/2 Andrey Korolyov <andrey@xxxxxxx>:
>>>> Hi Dominik,
>>>>
>>>> What`s about performance on the osd.87 at this moment, do you have any
>>>> related measurements?
>>>>
>>>> As for mine version of this issue, seems that quorum has some kind of
>>>> degradation over time - when I restarted mons, problem has gone and peering
>>>> time lowered by factor of ten or so.  Also seems that the problem has a
>>>> cumulative origin in the quorum - I did disk replacement over last week and
>>>> every time peering gets worse and worse.  I assume that it`s a time to put
>>>> more or less formalized problems to the bugtracker:
>>>> - such degradation over a time plus stuck placement groups,
>>>> - newer kind of problem related to the epochs too - restarting one mon
>>>> resulting to slight dataplacement change at the moment when _first rebooted_
>>>> monitor came up, not shown up with one hour delays between quorum restart.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 2, 2013 at 1:37 PM, Dominik Mostowiec
>>>> <dominikmostowiec@xxxxxxxxx> wrote:
>>>>>
>>>>> Hi,
>>>>> I got it.
>>>>>
>>>>> ceph health details
>>>>> HEALTH_WARN 3 pgs peering; 3 pgs stuck inactive; 5 pgs stuck unclean;
>>>>> recovery 64/38277874 degraded (0.000%)
>>>>> pg 5.df9 is stuck inactive for 138669.746512, current state peering,
>>>>> last acting [87,2,151]
>>>>> pg 5.a82 is stuck inactive for 138638.121867, current state peering,
>>>>> last acting [151,87,42]
>>>>> pg 5.80d is stuck inactive for 138621.069523, current state peering,
>>>>> last acting [151,47,87]
>>>>> pg 5.df9 is stuck unclean for 138669.746761, current state peering,
>>>>> last acting [87,2,151]
>>>>> pg 5.ae2 is stuck unclean for 139479.810499, current state active,
>>>>> last acting [87,151,28]
>>>>> pg 5.7b6 is stuck unclean for 139479.693271, current state active,
>>>>> last acting [87,105,2]
>>>>> pg 5.a82 is stuck unclean for 139479.713859, current state peering,
>>>>> last acting [151,87,42]
>>>>> pg 5.80d is stuck unclean for 139479.800820, current state peering,
>>>>> last acting [151,47,87]
>>>>> pg 5.df9 is peering, acting [87,2,151]
>>>>> pg 5.a82 is peering, acting [151,87,42]
>>>>> pg 5.80d is peering, acting [151,47,87]
>>>>> recovery 64/38277874 degraded (0.000%)
>>>>>
>>>>>
>>>>> osd pg query for 5.df9:
>>>>> { "state": "peering",
>>>>>   "up": [
>>>>>         87,
>>>>>         2,
>>>>>         151],
>>>>>   "acting": [
>>>>>         87,
>>>>>         2,
>>>>>         151],
>>>>>   "info": { "pgid": "5.df9",
>>>>>       "last_update": "119454'58844953",
>>>>>       "last_complete": "119454'58844953",
>>>>>       "log_tail": "119454'58843952",
>>>>>       "last_backfill": "MAX",
>>>>>       "purged_snaps": "[]",
>>>>>       "history": { "epoch_created": 365,
>>>>>           "last_epoch_started": 119456,
>>>>>           "last_epoch_clean": 119456,
>>>>>           "last_epoch_split": 117806,
>>>>>           "same_up_since": 119458,
>>>>>           "same_interval_since": 119458,
>>>>>           "same_primary_since": 119458,
>>>>>           "last_scrub": "119442'58732630",
>>>>>           "last_scrub_stamp": "2013-06-29 20:02:24.817352",
>>>>>           "last_deep_scrub": "119271'57224023",
>>>>>           "last_deep_scrub_stamp": "2013-06-23 02:04:49.654373",
>>>>>           "last_clean_scrub_stamp": "2013-06-29 20:02:24.817352"},
>>>>>       "stats": { "version": "119454'58844953",
>>>>>           "reported": "119458'42382189",
>>>>>           "state": "peering",
>>>>>           "last_fresh": "2013-06-30 20:35:29.489826",
>>>>>           "last_change": "2013-06-30 20:35:28.469854",
>>>>>           "last_active": "2013-06-30 20:33:24.126599",
>>>>>           "last_clean": "2013-06-30 20:33:24.126599",
>>>>>           "last_unstale": "2013-06-30 20:35:29.489826",
>>>>>           "mapping_epoch": 119455,
>>>>>           "log_start": "119454'58843952",
>>>>>           "ondisk_log_start": "119454'58843952",
>>>>>           "created": 365,
>>>>>           "last_epoch_clean": 365,
>>>>>           "parent": "0.0",
>>>>>           "parent_split_bits": 0,
>>>>>           "last_scrub": "119442'58732630",
>>>>>           "last_scrub_stamp": "2013-06-29 20:02:24.817352",
>>>>>           "last_deep_scrub": "119271'57224023",
>>>>>           "last_deep_scrub_stamp": "2013-06-23 02:04:49.654373",
>>>>>           "last_clean_scrub_stamp": "2013-06-29 20:02:24.817352",
>>>>>           "log_size": 135341,
>>>>>           "ondisk_log_size": 135341,
>>>>>           "stats_invalid": "0",
>>>>>           "stat_sum": { "num_bytes": 1010563373,
>>>>>               "num_objects": 3099,
>>>>>               "num_object_clones": 0,
>>>>>               "num_object_copies": 0,
>>>>>               "num_objects_missing_on_primary": 0,
>>>>>               "num_objects_degraded": 0,
>>>>>               "num_objects_unfound": 0,
>>>>>               "num_read": 302,
>>>>>               "num_read_kb": 0,
>>>>>               "num_write": 32264,
>>>>>               "num_write_kb": 798650,
>>>>>               "num_scrub_errors": 0,
>>>>>               "num_objects_recovered": 8235,
>>>>>               "num_bytes_recovered": 2085653757,
>>>>>               "num_keys_recovered": 249061471},
>>>>>           "stat_cat_sum": {},
>>>>>           "up": [
>>>>>                 87,
>>>>>                 2,
>>>>>                 151],
>>>>>           "acting": [
>>>>>                 87,
>>>>>                 2,
>>>>>                 151]},
>>>>>       "empty": 0,
>>>>>       "dne": 0,
>>>>>       "incomplete": 0,
>>>>>       "last_epoch_started": 119454},
>>>>>   "recovery_state": [
>>>>>         { "name": "Started\/Primary\/Peering\/GetLog",
>>>>>           "enter_time": "2013-06-30 20:35:28.545478",
>>>>>           "newest_update_osd": 2},
>>>>>         { "name": "Started\/Primary\/Peering",
>>>>>           "enter_time": "2013-06-30 20:35:28.469841",
>>>>>           "past_intervals": [
>>>>>                 { "first": 119453,
>>>>>                   "last": 119454,
>>>>>                   "maybe_went_rw": 1,
>>>>>                   "up": [
>>>>>                         87,
>>>>>                         2,
>>>>>                         151],
>>>>>                   "acting": [
>>>>>                         87,
>>>>>                         2,
>>>>>                         151]},
>>>>>                 { "first": 119455,
>>>>>                   "last": 119457,
>>>>>                   "maybe_went_rw": 1,
>>>>>                   "up": [
>>>>>                         2,
>>>>>                         151],
>>>>>                   "acting": [
>>>>>                         2,
>>>>>                         151]}],
>>>>>           "probing_osds": [
>>>>>                 2,
>>>>>                 87,
>>>>>                 151],
>>>>>           "down_osds_we_would_probe": [],
>>>>>           "peering_blocked_by": []},
>>>>>         { "name": "Started",
>>>>>           "enter_time": "2013-06-30 20:35:28.469765"}]}
>>>>>
>>>>>
>>>>> For other PGs: https://www.dropbox.com/s/q5iv8lwzecioy3d/pg_query.tar.tz
>>>>>
>>>>> --
>>>>> Regards
>>>>> Dominik
>>>>>
>>>>> 2013/6/30 Andrey Korolyov <andrey@xxxxxxx>:
>>>>> > That`s not a loop as it looks, sorry  - I had reproduced issue many
>>>>> > times and there is no such cpu-eating behavior in most cases, only
>>>>> > locked pgs are presented. Also I may celebrate returning of 'wrong
>>>>> > down mark' bug, at least for the 0.61.4 tag. For first one, I`ll send
>>>>> > a link with core as quick as I will be able to reproduce it on my test
>>>>> > env, and second one linked with 100% disk utilization, so I`m not sure
>>>>> > if this is right behavior or wrong.
>>>>> >
>>>>> > On Sat, Jun 29, 2013 at 1:28 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>>>> >> On Sat, 29 Jun 2013, Andrey Korolyov wrote:
>>>>> >>> There is almost same problem with the 0.61 cluster, at least with same
>>>>> >>> symptoms. Could be reproduced quite easily - remove an osd and then
>>>>> >>> mark it as out and with quite high probability one of neighbors will
>>>>> >>> be stuck at the end of peering process with couple of peering pgs with
>>>>> >>> primary copy on it. Such osd process seems to be stuck in some kind of
>>>>> >>> lock, eating exactly 100% of one core.
>>>>> >>
>>>>> >> Which version?
>>>>> >> Can you attach with gdb and get a backtrace to see what it is chewing
>>>>> >> on?
>>>>> >>
>>>>> >> Thanks!
>>>>> >> sage
>>>>> >>
>>>>> >>
>>>>> >>>
>>>>> >>> On Thu, Jun 13, 2013 at 8:42 PM, Gregory Farnum <greg@xxxxxxxxxxx>
>>>>> >>> wrote:
>>>>> >>> > On Thu, Jun 13, 2013 at 6:33 AM, S?awomir Skowron <szibis@xxxxxxxxx>
>>>>> >>> > wrote:
>>>>> >>> >> Hi, sorry for late response.
>>>>> >>> >>
>>>>> >>> >> https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view
>>>>> >>> >>
>>>>> >>> >> Logs in attachment, and on google drive, from today.
>>>>> >>> >>
>>>>> >>> >> https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view
>>>>> >>> >>
>>>>> >>> >> We have such problem today. And new logs are on google drive with
>>>>> >>> >> today date.
>>>>> >>> >>
>>>>> >>> >> Strange is that problematic osd.71 have about 10-15%, more space
>>>>> >>> >> used
>>>>> >>> >> then other osd in cluster.
>>>>> >>> >>
>>>>> >>> >> Today in one hour osd.71 fails 3 times in mon log, and after third
>>>>> >>> >> recovery has been stuck, and many 500 errors appears in http layer
>>>>> >>> >> on
>>>>> >>> >> top of rgw. When it's stuck, restarting osd71, osd.23, and osd.108,
>>>>> >>> >> all from stucked pg, helps, but i run even repair on this osd, just
>>>>> >>> >> in
>>>>> >>> >> case.
>>>>> >>> >>
>>>>> >>> >> I have some theory, that on this pg is rgw index of objects, or one
>>>>> >>> >> of
>>>>> >>> >> osd in this pg, have some problems with local filesystem or drive
>>>>> >>> >> bellow (raid controller reports nothing about that), but i do not
>>>>> >>> >> see
>>>>> >>> >> any problem in system.
>>>>> >>> >>
>>>>> >>> >> How can we find in which pg/osd index of objects in rgw bucket
>>>>> >>> >> exist ??
>>>>> >>> >
>>>>> >>> > You can find the location of any named object by grabbing the OSD
>>>>> >>> > map
>>>>> >>> > from the cluster and using the osdmaptool: "osdmaptool <mapfile>
>>>>> >>> > --test-map-object <objname> --pool <poolid>".
>>>>> >>> >
>>>>> >>> > You're not providing any context for your issue though, so we really
>>>>> >>> > can't help. What symptoms are you observing?
>>>>> >>> > -Greg
>>>>> >>> > Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>>> >>> > _______________________________________________
>>>>> >>> > ceph-users mailing list
>>>>> >>> > ceph-users@xxxxxxxxxxxxxx
>>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> >>> _______________________________________________
>>>>> >>> ceph-users mailing list
>>>>> >>> ceph-users@xxxxxxxxxxxxxx
>>>>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> >>>
>>>>> > _______________________________________________
>>>>> > ceph-users mailing list
>>>>> > ceph-users@xxxxxxxxxxxxxx
>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pozdrawiam
>>>>> Dominik
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Pozdrawiam
>>> Dominik
>>
>>
>>
>> --
>> Pozdrawiam
>> Dominik
>
>
>
> --
> Pozdrawiam
> Dominik

-- 
Pozdrawiam
Dominik
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com