Re: ceph mds memory usage 20GB : is it normal ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 22, 2018 at 3:11 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
> Hi,some new stats, mds memory is not 16G,
>
> I have almost same number of items and bytes in cache  vs some weeks ago when mds was using 8G. (ceph 12.2.5)
>
>
> root@ceph4-2:~# while sleep 1; do ceph daemon mds.ceph4-2.odiso.net perf dump | jq '.mds_mem.rss'; ceph daemon mds.ceph4-2.odiso.net dump_mempools | jq -c '.mds_co'; done
> 16905052
> {"items":43350988,"bytes":5257428143}
> 16905052
> {"items":43428329,"bytes":5283850173}
> 16905052
> {"items":43209167,"bytes":5208578149}
> 16905052
> {"items":43177631,"bytes":5198833577}
> 16905052
> {"items":43312734,"bytes":5252649462}
> 16905052
> {"items":43355753,"bytes":5277197972}
> 16905052
> {"items":43700693,"bytes":5303376141}
> 16905052
> {"items":43115809,"bytes":5156628138}
> ^C
>
>
>
>
> root@ceph4-2:~# ceph status
>   cluster:
>     id:     e22b8e83-3036-4fe5-8fd5-5ce9d539beca
>     health: HEALTH_OK
>
>   services:
>     mon: 3 daemons, quorum ceph4-1,ceph4-2,ceph4-3
>     mgr: ceph4-1.odiso.net(active), standbys: ceph4-2.odiso.net, ceph4-3.odiso.net
>     mds: cephfs4-1/1/1 up  {0=ceph4-2.odiso.net=up:active}, 2 up:standby
>     osd: 18 osds: 18 up, 18 in
>     rgw: 3 daemons active
>
>   data:
>     pools:   11 pools, 1992 pgs
>     objects: 75677k objects, 6045 GB
>     usage:   20579 GB used, 6246 GB / 26825 GB avail
>     pgs:     1992 active+clean
>
>   io:
>     client:   14441 kB/s rd, 2550 kB/s wr, 371 op/s rd, 95 op/s wr
>
>
> root@ceph4-2:~# ceph daemon mds.ceph4-2.odiso.net cache status
> {
>     "pool": {
>         "items": 44523608,
>         "bytes": 5326049009
>     }
> }
>
>
> root@ceph4-2:~# ceph daemon mds.ceph4-2.odiso.net perf dump
> {
>     "AsyncMessenger::Worker-0": {
>         "msgr_recv_messages": 798876013,
>         "msgr_send_messages": 825999506,
>         "msgr_recv_bytes": 7003223097381,
>         "msgr_send_bytes": 691501283744,
>         "msgr_created_connections": 148,
>         "msgr_active_connections": 146,
>         "msgr_running_total_time": 39914.832387470,
>         "msgr_running_send_time": 13744.704199430,
>         "msgr_running_recv_time": 32342.160588451,
>         "msgr_running_fast_dispatch_time": 5996.336446782
>     },
>     "AsyncMessenger::Worker-1": {
>         "msgr_recv_messages": 429668771,
>         "msgr_send_messages": 414760220,
>         "msgr_recv_bytes": 5003149410825,
>         "msgr_send_bytes": 396281427789,
>         "msgr_created_connections": 132,
>         "msgr_active_connections": 132,
>         "msgr_running_total_time": 23644.410515392,
>         "msgr_running_send_time": 7669.068710688,
>         "msgr_running_recv_time": 19751.610043696,
>         "msgr_running_fast_dispatch_time": 4331.023453385
>     },
>     "AsyncMessenger::Worker-2": {
>         "msgr_recv_messages": 1312910919,
>         "msgr_send_messages": 1260040403,
>         "msgr_recv_bytes": 5330386980976,
>         "msgr_send_bytes": 3341965016878,
>         "msgr_created_connections": 143,
>         "msgr_active_connections": 138,
>         "msgr_running_total_time": 61696.635450100,
>         "msgr_running_send_time": 23491.027014598,
>         "msgr_running_recv_time": 53858.409319734,
>         "msgr_running_fast_dispatch_time": 4312.451966809
>     },
>     "finisher-PurgeQueue": {
>         "queue_len": 0,
>         "complete_latency": {
>             "avgcount": 1889416,
>             "sum": 29224.227703697,
>             "avgtime": 0.015467333
>         }
>     },
>     "mds": {
>         "request": 1822420924,
>         "reply": 1822420886,
>         "reply_latency": {
>             "avgcount": 1822420886,
>             "sum": 5258467.616943274,
>             "avgtime": 0.002885429
>         },
>         "forward": 0,
>         "dir_fetch": 116035485,
>         "dir_commit": 1865012,
>         "dir_split": 17,
>         "dir_merge": 24,
>         "inode_max": 2147483647,
>         "inodes": 1600438,
>         "inodes_top": 210492,
>         "inodes_bottom": 100560,
>         "inodes_pin_tail": 1289386,
>         "inodes_pinned": 1299735,
>         "inodes_expired": 22223476046,
>         "inodes_with_caps": 1299137,
>         "caps": 2211546,
>         "subtrees": 2,
>         "traverse": 1953482456,
>         "traverse_hit": 1127647211,
>         "traverse_forward": 0,
>         "traverse_discover": 0,
>         "traverse_dir_fetch": 105833969,
>         "traverse_remote_ino": 31686,
>         "traverse_lock": 4344,
>         "load_cent": 182244014474,
>         "q": 104,
>         "exported": 0,
>         "exported_inodes": 0,
>         "imported": 0,
>         "imported_inodes": 0
>     },
>     "mds_cache": {
>         "num_strays": 14980,
>         "num_strays_delayed": 7,
>         "num_strays_enqueuing": 0,
>         "strays_created": 1672815,
>         "strays_enqueued": 1659514,
>         "strays_reintegrated": 666,
>         "strays_migrated": 0,
>         "num_recovering_processing": 0,
>         "num_recovering_enqueued": 0,
>         "num_recovering_prioritized": 0,
>         "recovery_started": 2,
>         "recovery_completed": 2,
>         "ireq_enqueue_scrub": 0,
>         "ireq_exportdir": 0,
>         "ireq_flush": 0,
>         "ireq_fragmentdir": 41,
>         "ireq_fragstats": 0,
>         "ireq_inodestats": 0
>     },
>     "mds_log": {
>         "evadd": 357717092,
>         "evex": 357717106,
>         "evtrm": 357716741,
>         "ev": 105198,
>         "evexg": 0,
>         "evexd": 365,
>         "segadd": 437124,
>         "segex": 437124,
>         "segtrm": 437123,
>         "seg": 130,
>         "segexg": 0,
>         "segexd": 1,
>         "expos": 6916004026339,
>         "wrpos": 6916179441942,
>         "rdpos": 6319502327537,
>         "jlat": {
>             "avgcount": 59071693,
>             "sum": 120823.311894779,
>             "avgtime": 0.002045367
>         },
>         "replayed": 104847
>     },
>     "mds_mem": {
>         "ino": 1599422,
>         "ino+": 22152405695,
>         "ino-": 22150806273,
>         "dir": 256943,
>         "dir+": 18460298,
>         "dir-": 18203355,
>         "dn": 1600689,
>         "dn+": 22227888283,
>         "dn-": 22226287594,
>         "cap": 2211546,
>         "cap+": 1674287476,
>         "cap-": 1672075930,
>         "rss": 16905052,
>         "heap": 313916,
>         "buf": 0
>     },
>     "mds_server": {
>         "dispatch_client_request": 1964131912,
>         "dispatch_server_request": 0,
>         "handle_client_request": 1822420924,
>         "handle_client_session": 15557609,
>         "handle_slave_request": 0,
>         "req_create": 4116952,
>         "req_getattr": 18696543,
>         "req_getfilelock": 0,
>         "req_link": 6625,
>         "req_lookup": 1425824734,
>         "req_lookuphash": 0,
>         "req_lookupino": 0,
>         "req_lookupname": 8703,
>         "req_lookupparent": 0,
>         "req_lookupsnap": 0,
>         "req_lssnap": 0,
>         "req_mkdir": 371878,
>         "req_mknod": 0,
>         "req_mksnap": 0,
>         "req_open": 351119806,
>         "req_readdir": 17103599,
>         "req_rename": 2437529,
>         "req_renamesnap": 0,
>         "req_rmdir": 78789,
>         "req_rmsnap": 0,
>         "req_rmxattr": 0,
>         "req_setattr": 4547650,
>         "req_setdirlayout": 0,
>         "req_setfilelock": 633219,
>         "req_setlayout": 0,
>         "req_setxattr": 2,
>         "req_symlink": 2520,
>         "req_unlink": 1589288
>     },
>     "mds_sessions": {
>         "session_count": 321,
>         "session_add": 383,
>         "session_remove": 62
>     },
>     "objecter": {
>         "op_active": 0,
>         "op_laggy": 0,
>         "op_send": 197932443,
>         "op_send_bytes": 605992324653,
>         "op_resend": 22,
>         "op_reply": 197932421,
>         "op": 197932421,
>         "op_r": 116256030,
>         "op_w": 81676391,
>         "op_rmw": 0,
>         "op_pg": 0,
>         "osdop_stat": 1518341,
>         "osdop_create": 4314348,
>         "osdop_read": 79810,
>         "osdop_write": 59151421,
>         "osdop_writefull": 237358,
>         "osdop_writesame": 0,
>         "osdop_append": 0,
>         "osdop_zero": 2,
>         "osdop_truncate": 9,
>         "osdop_delete": 2320714,
>         "osdop_mapext": 0,
>         "osdop_sparse_read": 0,
>         "osdop_clonerange": 0,
>         "osdop_getxattr": 29426577,
>         "osdop_setxattr": 8628696,
>         "osdop_cmpxattr": 0,
>         "osdop_rmxattr": 0,
>         "osdop_resetxattrs": 0,
>         "osdop_tmap_up": 0,
>         "osdop_tmap_put": 0,
>         "osdop_tmap_get": 0,
>         "osdop_call": 0,
>         "osdop_watch": 0,
>         "osdop_notify": 0,
>         "osdop_src_cmpxattr": 0,
>         "osdop_pgls": 0,
>         "osdop_pgls_filter": 0,
>         "osdop_other": 13551599,
>         "linger_active": 0,
>         "linger_send": 0,
>         "linger_resend": 0,
>         "linger_ping": 0,
>         "poolop_active": 0,
>         "poolop_send": 0,
>         "poolop_resend": 0,
>         "poolstat_active": 0,
>         "poolstat_send": 0,
>         "poolstat_resend": 0,
>         "statfs_active": 0,
>         "statfs_send": 0,
>         "statfs_resend": 0,
>         "command_active": 0,
>         "command_send": 0,
>         "command_resend": 0,
>         "map_epoch": 3907,
>         "map_full": 0,
>         "map_inc": 601,
>         "osd_sessions": 18,
>         "osd_session_open": 20,
>         "osd_session_close": 2,
>         "osd_laggy": 0,
>         "omap_wr": 3595801,
>         "omap_rd": 232070972,
>         "omap_del": 272598
>     },
>     "purge_queue": {
>         "pq_executing_ops": 0,
>         "pq_executing": 0,
>         "pq_executed": 1659514
>     },
>     "throttle-msgr_dispatch_throttler-mds": {
>         "val": 0,
>         "max": 104857600,
>         "get_started": 0,
>         "get": 2541455703,
>         "get_sum": 17148691767160,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 2541455703,
>         "take": 0,
>         "take_sum": 0,
>         "put": 2541455703,
>         "put_sum": 17148691767160,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     },
>     "throttle-objecter_bytes": {
>         "val": 0,
>         "max": 104857600,
>         "get_started": 0,
>         "get": 0,
>         "get_sum": 0,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 0,
>         "take": 197932421,
>         "take_sum": 606323353310,
>         "put": 182060027,
>         "put_sum": 606323353310,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     },
>     "throttle-objecter_ops": {
>         "val": 0,
>         "max": 1024,
>         "get_started": 0,
>         "get": 0,
>         "get_sum": 0,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 0,
>         "take": 197932421,
>         "take_sum": 197932421,
>         "put": 197932421,
>         "put_sum": 197932421,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     },
>     "throttle-write_buf_throttle": {
>         "val": 0,
>         "max": 3758096384,
>         "get_started": 0,
>         "get": 1659514,
>         "get_sum": 154334946,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 1659514,
>         "take": 0,
>         "take_sum": 0,
>         "put": 79728,
>         "put_sum": 154334946,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     },
>     "throttle-write_buf_throttle-0x55decea8e140": {
>         "val": 255839,
>         "max": 3758096384,
>         "get_started": 0,
>         "get": 357717092,
>         "get_sum": 596677113363,
>         "get_or_fail_fail": 0,
>         "get_or_fail_success": 357717092,
>         "take": 0,
>         "take_sum": 0,
>         "put": 59071693,
>         "put_sum": 596676857524,
>         "wait": {
>             "avgcount": 0,
>             "sum": 0.000000000,
>             "avgtime": 0.000000000
>         }
>     }
> }
>
>

Maybe there is memory leak. What is output of 'ceph tell mds.xx heap
stats'.  If the RSS size keeps increasing, please run profile heap for
a period of time.


ceph tell mds.xx heap start_profiler
"wait some time"
ceph tell mds.xx heap dump
ceph tell mds.xx heap stop_profiler
pprof --pdf <location pf ceph-mds binary>
/var/log/ceph/mds.xxx.profile.* > profile.pdf

send profile.pdf to us

Regards
Yan, Zheng

>
> ----- Mail original -----
> De: "Webert de Souza Lima" <webert.boss@xxxxxxxxx>
> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Envoyé: Lundi 14 Mai 2018 15:14:35
> Objet: Re:  ceph mds memory usage 20GB : is it normal ?
>
> On Sat, May 12, 2018 at 3:11 AM Alexandre DERUMIER < [ mailto:aderumier@xxxxxxxxx | aderumier@xxxxxxxxx ] > wrote:
>
>
> The documentation (luminous) say:
>
>
>
>
> BQ_BEGIN
>>mds cache size
>>
>>Description: The number of inodes to cache. A value of 0 indicates an unlimited number. It is recommended to use mds_cache_memory_limit to limit the amount of memory the MDS cache uses.
>>Type: 32-bit Integer
>>Default: 0
>>
> BQ_END
>
> BQ_BEGIN
> and, my mds_cache_memory_limit is currently at 5GB.
> BQ_END
>
> yeah I have only suggested that because the high memory usage seemed to trouble you and it might be a bug, so it's more of a workaround.
>
> Regards,
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> Belo Horizonte - Brasil
> IRC NICK - WebertRLZ
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux