Hi,
I forgot to say that maybe the Diff is lower than real (8Mb), because the memory usage was still high and i've prepared a new configuration with lower limit (5Mb). I've not reloaded the daemons for now, but maybe the configuration was loaded again today and that's the reason why is using less than 1Gb of RAM just now. Of course I've not rebooted the machine, but maybe if the daemon was killed for high memory usage then the new configuration is loaded now.
Greetings!
2018-07-23 21:07 GMT+02:00 Daniel Carrasco <d.carrasco@xxxxxxxxx>:
Thanks!,It's true that I've seen a continuous memory growth, but I've not thought in a memory leak. I don't remember exactly how many hours were neccesary to fill the memory, but I calculate that were about 14h.With the new configuration looks like memory grows slowly and when it reaches 5-6 GB stops. Sometimes looks like the daemon flush the memory and down again to less than 1Gb grown again to 5-6Gb slowly.Just today I don't know why and how, because I've not changed anything on the ceph cluster, but the memory has down to less than 1 Gb and still there 8 hours later. I've only deployed a git repository with some changes.I've some nodes on version 12.2.5 because I've detected this problem and I didn't know if was for the latest version, so I've stopped the update. The one that is the active MDS is on latest version (12.2.7), and I've programmed an update for the rest of nodes the thursday.A graphic of the memory usage of latest days with that configuration:I haven't info about when the problem was worst (512MB of MDS memory limit and 15-16Gb of usage), because memory usage was not logged. I've only a heap stats from that were dumped when the daemon was in progress to fill the memory:# ceph tell mds.kavehome-mgto-pro-fs01 heap stats2018-07-19 00:43:46.142560 7f5a7a7fc700 0 client.1318388 ms_handle_reset on 10.22.0.168:6800/11298481282018-07-19 00:43:46.181133 7f5a7b7fe700 0 client.1318391 ms_handle_reset on 10.22.0.168:6800/1129848128mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:------------------------------------------------ MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by applicationMALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelistMALLOC: + 172148208 ( 164.2 MiB) Bytes in central cache freelistMALLOC: + 19031168 ( 18.1 MiB) Bytes in transfer cache freelistMALLOC: + 23987552 ( 22.9 MiB) Bytes in thread cache freelistsMALLOC: + 20869280 ( 19.9 MiB) Bytes in malloc metadataMALLOC: ------------MALLOC: = 10219016352 ( 9745.6 MiB) Actual memory used (physical + swap)MALLOC: + 3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped)MALLOC: ------------MALLOC: = 14132703392 (13478.0 MiB) Virtual address space usedMALLOC:MALLOC: 63875 Spans in useMALLOC: 16 Thread heaps in useMALLOC: 8192 Tcmalloc page size------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).Bytes released to the OS take up virtual address space but no physical memory.Here's the Diff:------------------------------------------------------------ ------------------------------ -------------------------- {"diff": {"current": {"admin_socket": "/var/run/ceph/ceph-mds.kavehome-mgto-pro-fs01.asok", "auth_client_required": "cephx","bluestore_cache_size_hdd": "80530636","bluestore_cache_size_ssd": "80530636","err_to_stderr": "true","fsid": "f015f888-6e0c-4203-aea8-ef0f69ef7bd8", "internal_safe_to_start_threads": "true", "keyring": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01/ keyring", "log_file": "/var/log/ceph/ceph-mds.kavehome-mgto-pro-fs01.log", "log_max_recent": "10000","log_to_stderr": "false","mds_cache_memory_limit": "53687091","mds_data": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01", "mgr_data": "/var/lib/ceph/mgr/ceph-kavehome-mgto-pro-fs01", "mon_cluster_log_file": "default=/var/log/ceph/ceph.$channel.log cluster=/var/log/ceph/ceph. log", "mon_data": "/var/lib/ceph/mon/ceph-kavehome-mgto-pro-fs01", "mon_debug_dump_location": "/var/log/ceph/ceph-mds.kavehome-mgto-pro-fs01.tdump", "mon_host": "10.22.0.168,10.22.0.140,10.22.0.127", "mon_initial_members": "kavehome-mgto-pro-fs01, kavehome-mgto-pro-fs02, kavehome-mgto-pro-fs03","osd_data": "/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01", "osd_journal": "/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01/ journal", "public_addr": "10.22.0.168:0/0","public_network": "10.22.0.0/24","rgw_data": "/var/lib/ceph/radosgw/ceph-kavehome-mgto-pro-fs01", "setgroup": "ceph","setuser": "ceph"},"defaults": {"admin_socket": "","auth_client_required": "cephx, none","bluestore_cache_size_hdd": "1073741824","bluestore_cache_size_ssd": "3221225472","err_to_stderr": "false","fsid": "00000000-0000-0000-0000-000000000000", "internal_safe_to_start_threads": "false", "keyring": "/etc/ceph/$cluster.$name.keyring,/etc/ceph/$cluster. keyring,/etc/ceph/keyring,/ etc/ceph/keyring.bin,", "log_file": "","log_max_recent": "500","log_to_stderr": "true","mds_cache_memory_limit": "1073741824","mds_data": "/var/lib/ceph/mds/$cluster-$id", "mgr_data": "/var/lib/ceph/mgr/$cluster-$id", "mon_cluster_log_file": "default=/var/log/ceph/$cluster.$channel.log cluster=/var/log/ceph/$ cluster.log", "mon_data": "/var/lib/ceph/mon/$cluster-$id", "mon_debug_dump_location": "/var/log/ceph/$cluster-$name.tdump", "mon_host": "","mon_initial_members": "","osd_data": "/var/lib/ceph/osd/$cluster-$id", "osd_journal": "/var/lib/ceph/osd/$cluster-$id/journal", "public_addr": "-","public_network": "","rgw_data": "/var/lib/ceph/radosgw/$cluster-$id", "setgroup": "","setuser": ""}},"unknown": []}------------------------------------------------------------ ------------------------------ ---------------- Perf Dump------------------------------------------------------------ ------------------------------ --------------- {"AsyncMessenger::Worker-0": {"msgr_recv_messages": 1350895,"msgr_send_messages": 1593759,"msgr_recv_bytes": 301786293,"msgr_send_bytes": 341807191,"msgr_created_connections": 148,"msgr_active_connections": 45,"msgr_running_total_time": 119.217157290,"msgr_running_send_time": 39.714493374,"msgr_running_recv_time": 127.455260807,"msgr_running_fast_dispatch_time": 0.117634930 },"AsyncMessenger::Worker-1": {"msgr_recv_messages": 2996114,"msgr_send_messages": 3113274,"msgr_recv_bytes": 804875332,"msgr_send_bytes": 1231962873,"msgr_created_connections": 151,"msgr_active_connections": 48,"msgr_running_total_time": 248.962533700,"msgr_running_send_time": 83.497214869,"msgr_running_recv_time": 547.534653813,"msgr_running_fast_dispatch_time": 0.125151678 },"AsyncMessenger::Worker-2": {"msgr_recv_messages": 1793419,"msgr_send_messages": 2117240,"msgr_recv_bytes": 1425674729,"msgr_send_bytes": 871324466,"msgr_created_connections": 325,"msgr_active_connections": 54,"msgr_running_total_time": 160.001753142,"msgr_running_send_time": 49.679463024,"msgr_running_recv_time": 205.535692064,"msgr_running_fast_dispatch_time": 4.350479591 },"finisher-PurgeQueue": {"queue_len": 0,"complete_latency": {"avgcount": 755,"sum": 0.022316252,"avgtime": 0.000029557}},"mds": {"request": 4942944,"reply": 489638,"reply_latency": {"avgcount": 489638,"sum": 771.955019623,"avgtime": 0.001576583},"forward": 4453296,"dir_fetch": 101036,"dir_commit": 3,"dir_split": 0,"dir_merge": 0,"inode_max": 2147483647,"inodes": 505,"inodes_top": 96,"inodes_bottom": 398,"inodes_pin_tail": 11,"inodes_pinned": 367,"inodes_expired": 1556356,"inodes_with_caps": 325,"caps": 1192,"subtrees": 16,"traverse": 4956673,"traverse_hit": 496867,"traverse_forward": 4450841,"traverse_discover": 166,"traverse_dir_fetch": 1657,"traverse_remote_ino": 0,"traverse_lock": 19,"load_cent": 494278118,"q": 0,"exported": 1187,"exported_inodes": 664127,"imported": 947,"imported_inodes": 76628},"mds_cache": {"num_strays": 0,"num_strays_delayed": 0,"num_strays_enqueuing": 0,"strays_created": 124,"strays_enqueued": 124,"strays_reintegrated": 0,"strays_migrated": 0,"num_recovering_processing": 0,"num_recovering_enqueued": 0,"num_recovering_prioritized": 0,"recovery_started": 0,"recovery_completed": 0,"ireq_enqueue_scrub": 0,"ireq_exportdir": 1189,"ireq_flush": 0,"ireq_fragmentdir": 0,"ireq_fragstats": 0,"ireq_inodestats": 0},"mds_log": {"evadd": 125666,"evex": 116984,"evtrm": 116984,"ev": 117582,"evexg": 0,"evexd": 933,"segadd": 138,"segex": 138,"segtrm": 138,"seg": 129,"segexg": 0,"segexd": 1,"expos": 25715287703,"wrpos": 25862332030,"rdpos": 25663431097,"jlat": {"avgcount": 23473,"sum": 98.111299299,"avgtime": 0.004179751},"replayed": 108900},"mds_mem": {"ino": 507,"ino+": 1579334,"ino-": 1578827,"dir": 312,"dir+": 101932,"dir-": 101620,"dn": 529,"dn+": 1580751,"dn-": 1580222,"cap": 1192,"cap+": 1825843,"cap-": 1824651,"rss": 258840,"heap": 313880,"buf": 0},"mds_server": {"dispatch_client_request": 5081829,"dispatch_server_request": 540,"handle_client_request": 4942944,"handle_client_session": 233505,"handle_slave_request": 846,"req_create": 128,"req_getattr": 38805,"req_getfilelock": 0,"req_link": 0,"req_lookup": 242216,"req_lookuphash": 0,"req_lookupino": 0,"req_lookupname": 2,"req_lookupparent": 0,"req_lookupsnap": 0,"req_lssnap": 0,"req_mkdir": 0,"req_mknod": 0,"req_mksnap": 0,"req_open": 2155,"req_readdir": 206315,"req_rename": 21,"req_renamesnap": 0,"req_rmdir": 0,"req_rmsnap": 0,"req_rmxattr": 0,"req_setattr": 2,"req_setdirlayout": 0,"req_setfilelock": 0,"req_setlayout": 0,"req_setxattr": 0,"req_symlink": 0,"req_unlink": 122},"mds_sessions": {"session_count": 10,"session_add": 128,"session_remove": 118},"objecter": {"op_active": 0,"op_laggy": 0,"op_send": 136767,"op_send_bytes": 202196534,"op_resend": 0,"op_reply": 136767,"op": 136767,"op_r": 101193,"op_w": 35574,"op_rmw": 0,"op_pg": 0,"osdop_stat": 5,"osdop_create": 0,"osdop_read": 150,"osdop_write": 23587,"osdop_writefull": 11750,"osdop_writesame": 0,"osdop_append": 0,"osdop_zero": 2,"osdop_truncate": 0,"osdop_delete": 228,"osdop_mapext": 0,"osdop_sparse_read": 0,"osdop_clonerange": 0,"osdop_getxattr": 100784,"osdop_setxattr": 0,"osdop_cmpxattr": 0,"osdop_rmxattr": 0,"osdop_resetxattrs": 0,"osdop_tmap_up": 0,"osdop_tmap_put": 0,"osdop_tmap_get": 0,"osdop_call": 0,"osdop_watch": 0,"osdop_notify": 0,"osdop_src_cmpxattr": 0,"osdop_pgls": 0,"osdop_pgls_filter": 0,"osdop_other": 3,"linger_active": 0,"linger_send": 0,"linger_resend": 0,"linger_ping": 0,"poolop_active": 0,"poolop_send": 0,"poolop_resend": 0,"poolstat_active": 0,"poolstat_send": 0,"poolstat_resend": 0,"statfs_active": 0,"statfs_send": 0,"statfs_resend": 0,"command_active": 0,"command_send": 0,"command_resend": 0,"map_epoch": 468,"map_full": 0,"map_inc": 39,"osd_sessions": 3,"osd_session_open": 479,"osd_session_close": 476,"osd_laggy": 0,"omap_wr": 7,"omap_rd": 202074,"omap_del": 1},"purge_queue": {"pq_executing_ops": 0,"pq_executing": 0,"pq_executed": 124},"throttle-msgr_dispatch_throttler-mds": { "val": 0,"max": 104857600,"get_started": 0,"get": 6140428,"get_sum": 2077944682,"get_or_fail_fail": 0,"get_or_fail_success": 6140428,"take": 0,"take_sum": 0,"put": 6140428,"put_sum": 2077944682,"wait": {"avgcount": 0,"sum": 0.000000000,"avgtime": 0.000000000}},"throttle-objecter_bytes": {"val": 0,"max": 104857600,"get_started": 0,"get": 0,"get_sum": 0,"get_or_fail_fail": 0,"get_or_fail_success": 0,"take": 136767,"take_sum": 339484250,"put": 136523,"put_sum": 339484250,"wait": {"avgcount": 0,"sum": 0.000000000,"avgtime": 0.000000000}},"throttle-objecter_ops": {"val": 0,"max": 1024,"get_started": 0,"get": 0,"get_sum": 0,"get_or_fail_fail": 0,"get_or_fail_success": 0,"take": 136767,"take_sum": 136767,"put": 136767,"put_sum": 136767,"wait": {"avgcount": 0,"sum": 0.000000000,"avgtime": 0.000000000}},"throttle-write_buf_throttle": {"val": 0,"max": 3758096384,"get_started": 0,"get": 124,"get_sum": 11532,"get_or_fail_fail": 0,"get_or_fail_success": 124,"take": 0,"take_sum": 0,"put": 109,"put_sum": 11532,"wait": {"avgcount": 0,"sum": 0.000000000,"avgtime": 0.000000000}},"throttle-write_buf_throttle-0x55faf5ba4220": { "val": 0,"max": 3758096384,"get_started": 0,"get": 125666,"get_sum": 198900816,"get_or_fail_fail": 0,"get_or_fail_success": 125666,"take": 0,"take_sum": 0,"put": 23473,"put_sum": 198900816,"wait": {"avgcount": 0,"sum": 0.000000000,"avgtime": 0.000000000}}}------------------------------------------------------------ ------------------------------ ---- dump_mempools------------------------------------------------------------ ------------------------------ ---- {"bloom_filter": {"items": 120,"bytes": 120},"bluestore_alloc": {"items": 0,"bytes": 0},"bluestore_cache_data": {"items": 0,"bytes": 0},"bluestore_cache_onode": {"items": 0,"bytes": 0},"bluestore_cache_other": {"items": 0,"bytes": 0},"bluestore_fsck": {"items": 0,"bytes": 0},"bluestore_txc": {"items": 0,"bytes": 0},"bluestore_writing_deferred": {"items": 0,"bytes": 0},"bluestore_writing": {"items": 0,"bytes": 0},"bluefs": {"items": 0,"bytes": 0},"buffer_anon": {"items": 96401,"bytes": 16010198},"buffer_meta": {"items": 1,"bytes": 88},"osd": {"items": 0,"bytes": 0},"osd_mapbl": {"items": 0,"bytes": 0},"osd_pglog": {"items": 0,"bytes": 0},"osdmap": {"items": 80,"bytes": 3296},"osdmap_mapping": {"items": 0,"bytes": 0},"pgmap": {"items": 0,"bytes": 0},"mds_co": {"items": 17604,"bytes": 2330840},"unittest_1": {"items": 0,"bytes": 0},"unittest_2": {"items": 0,"bytes": 0},"total": {"items": 114206,"bytes": 18344542}}------------------------------------------------------------ ------------------------------ ------------------------- Sorry for my english!.Greetings!!On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco <d.carrasco@xxxxxxxxx> wrote:What! Please post `ceph daemon mds.<name> config diff`, `... perf
> Hi, thanks for your response.
>
> Clients are about 6, and 4 of them are the most of time on standby. Only two
> are active servers that are serving the webpage. Also we've a varnish on
> front, so are not getting all the load (below 30% in PHP is not much).
> About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.
dump`, and `... dump_mempools ` from the server the active MDS is on.We've seen reports of possible memory leaks before and the potential
> I've tested
> also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up to
> 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at least
> the memory usage is stable on less than 6Gb (now is using about 1GB of RAM).
fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
Your MDS cache size should be configured to 1-8GB (depending on your
preference) so it's disturbing to see you set it so low.
--
Patrick Donnelly
_________________________________________
Daniel Carrasco Marín
Ingeniería para la Innovación i2TIC, S.L.
Tlf: +34 911 12 32 84 Ext: 223
www.i2tic.com
_________________________________________
Ingeniería para la Innovación i2TIC, S.L.
Tlf: +34 911 12 32 84 Ext: 223
www.i2tic.com
_________________________________________
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com