Re: ceph mds memory usage 20GB : is it normal ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 24, 2018 at 7:22 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
> Thanks!
>
>
> here the profile.pdf
>
> 10-15min profiling, I can't do it longer because my clients where lagging.
>
> but I think it should be enough to observe the rss memory increase.
>
>

Still don't find any clue. Does the cephfs have idle period. If it
has, could you decrease mds's cache size and check what happens. For
example, run following commands during the old period.

ceph daemon mds.xx flush journal
ceph daemon mds.xx config set mds_cache_size 10000;
"wait a minute"
ceph tell mds.xx heap stats
ceph daemon mds.xx config set mds_cache_size 0


>
>
> ----- Mail original -----
> De: "Zheng Yan" <ukernel@xxxxxxxxx>
> À: "aderumier" <aderumier@xxxxxxxxx>
> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Envoyé: Jeudi 24 Mai 2018 11:34:20
> Objet: Re:  ceph mds memory usage 20GB : is it normal ?
>
> On Tue, May 22, 2018 at 3:11 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
>> Hi,some new stats, mds memory is not 16G,
>>
>> I have almost same number of items and bytes in cache vs some weeks ago when mds was using 8G. (ceph 12.2.5)
>>
>>
>> root@ceph4-2:~# while sleep 1; do ceph daemon mds.ceph4-2.odiso.net perf dump | jq '.mds_mem.rss'; ceph daemon mds.ceph4-2.odiso.net dump_mempools | jq -c '.mds_co'; done
>> 16905052
>> {"items":43350988,"bytes":5257428143}
>> 16905052
>> {"items":43428329,"bytes":5283850173}
>> 16905052
>> {"items":43209167,"bytes":5208578149}
>> 16905052
>> {"items":43177631,"bytes":5198833577}
>> 16905052
>> {"items":43312734,"bytes":5252649462}
>> 16905052
>> {"items":43355753,"bytes":5277197972}
>> 16905052
>> {"items":43700693,"bytes":5303376141}
>> 16905052
>> {"items":43115809,"bytes":5156628138}
>> ^C
>>
>>
>>
>>
>> root@ceph4-2:~# ceph status
>> cluster:
>> id: e22b8e83-3036-4fe5-8fd5-5ce9d539beca
>> health: HEALTH_OK
>>
>> services:
>> mon: 3 daemons, quorum ceph4-1,ceph4-2,ceph4-3
>> mgr: ceph4-1.odiso.net(active), standbys: ceph4-2.odiso.net, ceph4-3.odiso.net
>> mds: cephfs4-1/1/1 up {0=ceph4-2.odiso.net=up:active}, 2 up:standby
>> osd: 18 osds: 18 up, 18 in
>> rgw: 3 daemons active
>>
>> data:
>> pools: 11 pools, 1992 pgs
>> objects: 75677k objects, 6045 GB
>> usage: 20579 GB used, 6246 GB / 26825 GB avail
>> pgs: 1992 active+clean
>>
>> io:
>> client: 14441 kB/s rd, 2550 kB/s wr, 371 op/s rd, 95 op/s wr
>>
>>
>> root@ceph4-2:~# ceph daemon mds.ceph4-2.odiso.net cache status
>> {
>> "pool": {
>> "items": 44523608,
>> "bytes": 5326049009
>> }
>> }
>>
>>
>> root@ceph4-2:~# ceph daemon mds.ceph4-2.odiso.net perf dump
>> {
>> "AsyncMessenger::Worker-0": {
>> "msgr_recv_messages": 798876013,
>> "msgr_send_messages": 825999506,
>> "msgr_recv_bytes": 7003223097381,
>> "msgr_send_bytes": 691501283744,
>> "msgr_created_connections": 148,
>> "msgr_active_connections": 146,
>> "msgr_running_total_time": 39914.832387470,
>> "msgr_running_send_time": 13744.704199430,
>> "msgr_running_recv_time": 32342.160588451,
>> "msgr_running_fast_dispatch_time": 5996.336446782
>> },
>> "AsyncMessenger::Worker-1": {
>> "msgr_recv_messages": 429668771,
>> "msgr_send_messages": 414760220,
>> "msgr_recv_bytes": 5003149410825,
>> "msgr_send_bytes": 396281427789,
>> "msgr_created_connections": 132,
>> "msgr_active_connections": 132,
>> "msgr_running_total_time": 23644.410515392,
>> "msgr_running_send_time": 7669.068710688,
>> "msgr_running_recv_time": 19751.610043696,
>> "msgr_running_fast_dispatch_time": 4331.023453385
>> },
>> "AsyncMessenger::Worker-2": {
>> "msgr_recv_messages": 1312910919,
>> "msgr_send_messages": 1260040403,
>> "msgr_recv_bytes": 5330386980976,
>> "msgr_send_bytes": 3341965016878,
>> "msgr_created_connections": 143,
>> "msgr_active_connections": 138,
>> "msgr_running_total_time": 61696.635450100,
>> "msgr_running_send_time": 23491.027014598,
>> "msgr_running_recv_time": 53858.409319734,
>> "msgr_running_fast_dispatch_time": 4312.451966809
>> },
>> "finisher-PurgeQueue": {
>> "queue_len": 0,
>> "complete_latency": {
>> "avgcount": 1889416,
>> "sum": 29224.227703697,
>> "avgtime": 0.015467333
>> }
>> },
>> "mds": {
>> "request": 1822420924,
>> "reply": 1822420886,
>> "reply_latency": {
>> "avgcount": 1822420886,
>> "sum": 5258467.616943274,
>> "avgtime": 0.002885429
>> },
>> "forward": 0,
>> "dir_fetch": 116035485,
>> "dir_commit": 1865012,
>> "dir_split": 17,
>> "dir_merge": 24,
>> "inode_max": 2147483647,
>> "inodes": 1600438,
>> "inodes_top": 210492,
>> "inodes_bottom": 100560,
>> "inodes_pin_tail": 1289386,
>> "inodes_pinned": 1299735,
>> "inodes_expired": 22223476046,
>> "inodes_with_caps": 1299137,
>> "caps": 2211546,
>> "subtrees": 2,
>> "traverse": 1953482456,
>> "traverse_hit": 1127647211,
>> "traverse_forward": 0,
>> "traverse_discover": 0,
>> "traverse_dir_fetch": 105833969,
>> "traverse_remote_ino": 31686,
>> "traverse_lock": 4344,
>> "load_cent": 182244014474,
>> "q": 104,
>> "exported": 0,
>> "exported_inodes": 0,
>> "imported": 0,
>> "imported_inodes": 0
>> },
>> "mds_cache": {
>> "num_strays": 14980,
>> "num_strays_delayed": 7,
>> "num_strays_enqueuing": 0,
>> "strays_created": 1672815,
>> "strays_enqueued": 1659514,
>> "strays_reintegrated": 666,
>> "strays_migrated": 0,
>> "num_recovering_processing": 0,
>> "num_recovering_enqueued": 0,
>> "num_recovering_prioritized": 0,
>> "recovery_started": 2,
>> "recovery_completed": 2,
>> "ireq_enqueue_scrub": 0,
>> "ireq_exportdir": 0,
>> "ireq_flush": 0,
>> "ireq_fragmentdir": 41,
>> "ireq_fragstats": 0,
>> "ireq_inodestats": 0
>> },
>> "mds_log": {
>> "evadd": 357717092,
>> "evex": 357717106,
>> "evtrm": 357716741,
>> "ev": 105198,
>> "evexg": 0,
>> "evexd": 365,
>> "segadd": 437124,
>> "segex": 437124,
>> "segtrm": 437123,
>> "seg": 130,
>> "segexg": 0,
>> "segexd": 1,
>> "expos": 6916004026339,
>> "wrpos": 6916179441942,
>> "rdpos": 6319502327537,
>> "jlat": {
>> "avgcount": 59071693,
>> "sum": 120823.311894779,
>> "avgtime": 0.002045367
>> },
>> "replayed": 104847
>> },
>> "mds_mem": {
>> "ino": 1599422,
>> "ino+": 22152405695,
>> "ino-": 22150806273,
>> "dir": 256943,
>> "dir+": 18460298,
>> "dir-": 18203355,
>> "dn": 1600689,
>> "dn+": 22227888283,
>> "dn-": 22226287594,
>> "cap": 2211546,
>> "cap+": 1674287476,
>> "cap-": 1672075930,
>> "rss": 16905052,
>> "heap": 313916,
>> "buf": 0
>> },
>> "mds_server": {
>> "dispatch_client_request": 1964131912,
>> "dispatch_server_request": 0,
>> "handle_client_request": 1822420924,
>> "handle_client_session": 15557609,
>> "handle_slave_request": 0,
>> "req_create": 4116952,
>> "req_getattr": 18696543,
>> "req_getfilelock": 0,
>> "req_link": 6625,
>> "req_lookup": 1425824734,
>> "req_lookuphash": 0,
>> "req_lookupino": 0,
>> "req_lookupname": 8703,
>> "req_lookupparent": 0,
>> "req_lookupsnap": 0,
>> "req_lssnap": 0,
>> "req_mkdir": 371878,
>> "req_mknod": 0,
>> "req_mksnap": 0,
>> "req_open": 351119806,
>> "req_readdir": 17103599,
>> "req_rename": 2437529,
>> "req_renamesnap": 0,
>> "req_rmdir": 78789,
>> "req_rmsnap": 0,
>> "req_rmxattr": 0,
>> "req_setattr": 4547650,
>> "req_setdirlayout": 0,
>> "req_setfilelock": 633219,
>> "req_setlayout": 0,
>> "req_setxattr": 2,
>> "req_symlink": 2520,
>> "req_unlink": 1589288
>> },
>> "mds_sessions": {
>> "session_count": 321,
>> "session_add": 383,
>> "session_remove": 62
>> },
>> "objecter": {
>> "op_active": 0,
>> "op_laggy": 0,
>> "op_send": 197932443,
>> "op_send_bytes": 605992324653,
>> "op_resend": 22,
>> "op_reply": 197932421,
>> "op": 197932421,
>> "op_r": 116256030,
>> "op_w": 81676391,
>> "op_rmw": 0,
>> "op_pg": 0,
>> "osdop_stat": 1518341,
>> "osdop_create": 4314348,
>> "osdop_read": 79810,
>> "osdop_write": 59151421,
>> "osdop_writefull": 237358,
>> "osdop_writesame": 0,
>> "osdop_append": 0,
>> "osdop_zero": 2,
>> "osdop_truncate": 9,
>> "osdop_delete": 2320714,
>> "osdop_mapext": 0,
>> "osdop_sparse_read": 0,
>> "osdop_clonerange": 0,
>> "osdop_getxattr": 29426577,
>> "osdop_setxattr": 8628696,
>> "osdop_cmpxattr": 0,
>> "osdop_rmxattr": 0,
>> "osdop_resetxattrs": 0,
>> "osdop_tmap_up": 0,
>> "osdop_tmap_put": 0,
>> "osdop_tmap_get": 0,
>> "osdop_call": 0,
>> "osdop_watch": 0,
>> "osdop_notify": 0,
>> "osdop_src_cmpxattr": 0,
>> "osdop_pgls": 0,
>> "osdop_pgls_filter": 0,
>> "osdop_other": 13551599,
>> "linger_active": 0,
>> "linger_send": 0,
>> "linger_resend": 0,
>> "linger_ping": 0,
>> "poolop_active": 0,
>> "poolop_send": 0,
>> "poolop_resend": 0,
>> "poolstat_active": 0,
>> "poolstat_send": 0,
>> "poolstat_resend": 0,
>> "statfs_active": 0,
>> "statfs_send": 0,
>> "statfs_resend": 0,
>> "command_active": 0,
>> "command_send": 0,
>> "command_resend": 0,
>> "map_epoch": 3907,
>> "map_full": 0,
>> "map_inc": 601,
>> "osd_sessions": 18,
>> "osd_session_open": 20,
>> "osd_session_close": 2,
>> "osd_laggy": 0,
>> "omap_wr": 3595801,
>> "omap_rd": 232070972,
>> "omap_del": 272598
>> },
>> "purge_queue": {
>> "pq_executing_ops": 0,
>> "pq_executing": 0,
>> "pq_executed": 1659514
>> },
>> "throttle-msgr_dispatch_throttler-mds": {
>> "val": 0,
>> "max": 104857600,
>> "get_started": 0,
>> "get": 2541455703,
>> "get_sum": 17148691767160,
>> "get_or_fail_fail": 0,
>> "get_or_fail_success": 2541455703,
>> "take": 0,
>> "take_sum": 0,
>> "put": 2541455703,
>> "put_sum": 17148691767160,
>> "wait": {
>> "avgcount": 0,
>> "sum": 0.000000000,
>> "avgtime": 0.000000000
>> }
>> },
>> "throttle-objecter_bytes": {
>> "val": 0,
>> "max": 104857600,
>> "get_started": 0,
>> "get": 0,
>> "get_sum": 0,
>> "get_or_fail_fail": 0,
>> "get_or_fail_success": 0,
>> "take": 197932421,
>> "take_sum": 606323353310,
>> "put": 182060027,
>> "put_sum": 606323353310,
>> "wait": {
>> "avgcount": 0,
>> "sum": 0.000000000,
>> "avgtime": 0.000000000
>> }
>> },
>> "throttle-objecter_ops": {
>> "val": 0,
>> "max": 1024,
>> "get_started": 0,
>> "get": 0,
>> "get_sum": 0,
>> "get_or_fail_fail": 0,
>> "get_or_fail_success": 0,
>> "take": 197932421,
>> "take_sum": 197932421,
>> "put": 197932421,
>> "put_sum": 197932421,
>> "wait": {
>> "avgcount": 0,
>> "sum": 0.000000000,
>> "avgtime": 0.000000000
>> }
>> },
>> "throttle-write_buf_throttle": {
>> "val": 0,
>> "max": 3758096384,
>> "get_started": 0,
>> "get": 1659514,
>> "get_sum": 154334946,
>> "get_or_fail_fail": 0,
>> "get_or_fail_success": 1659514,
>> "take": 0,
>> "take_sum": 0,
>> "put": 79728,
>> "put_sum": 154334946,
>> "wait": {
>> "avgcount": 0,
>> "sum": 0.000000000,
>> "avgtime": 0.000000000
>> }
>> },
>> "throttle-write_buf_throttle-0x55decea8e140": {
>> "val": 255839,
>> "max": 3758096384,
>> "get_started": 0,
>> "get": 357717092,
>> "get_sum": 596677113363,
>> "get_or_fail_fail": 0,
>> "get_or_fail_success": 357717092,
>> "take": 0,
>> "take_sum": 0,
>> "put": 59071693,
>> "put_sum": 596676857524,
>> "wait": {
>> "avgcount": 0,
>> "sum": 0.000000000,
>> "avgtime": 0.000000000
>> }
>> }
>> }
>>
>>
>
> Maybe there is memory leak. What is output of 'ceph tell mds.xx heap
> stats'. If the RSS size keeps increasing, please run profile heap for
> a period of time.
>
>
> ceph tell mds.xx heap start_profiler
> "wait some time"
> ceph tell mds.xx heap dump
> ceph tell mds.xx heap stop_profiler
> pprof --pdf <location pf ceph-mds binary>
> /var/log/ceph/mds.xxx.profile.* > profile.pdf
>
> send profile.pdf to us
>
> Regards
> Yan, Zheng
>
>>
>> ----- Mail original -----
>> De: "Webert de Souza Lima" <webert.boss@xxxxxxxxx>
>> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>> Envoyé: Lundi 14 Mai 2018 15:14:35
>> Objet: Re:  ceph mds memory usage 20GB : is it normal ?
>>
>> On Sat, May 12, 2018 at 3:11 AM Alexandre DERUMIER < [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] > wrote:
>>
>>
>> The documentation (luminous) say:
>>
>>
>>
>>
>>
>>>mds cache size
>>>
>>>Description: The number of inodes to cache. A value of 0 indicates an unlimited number. It is recommended to use mds_cache_memory_limit to limit the amount of memory the MDS cache uses.
>>>Type: 32-bit Integer
>>>Default: 0
>>>
>>
>>
>> BQ_BEGIN
>> and, my mds_cache_memory_limit is currently at 5GB.
>> BQ_END
>>
>> yeah I have only suggested that because the high memory usage seemed to trouble you and it might be a bug, so it's more of a workaround.
>>
>> Regards,
>> Webert Lima
>> DevOps Engineer at MAV Tecnologia
>> Belo Horizonte - Brasil
>> IRC NICK - WebertRLZ
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux