Re: mds0: Client failing to respond to cache pressure

谷枫 <feicheche@xxxxxxxxx> · Fri, 10 Jul 2015 18:19:17 +0800

Thank you John,All my server is ubuntu14.04 with 3.16 kernel. 
Not all of clients appear this problem, the cluster seems functioning well now. 
As you say,i will change the mds_cache_size to 500000 from 100000 to take a test, thanks again!

2015-07-10 17:00 GMT+08:00 John Spray <john.spray@xxxxxxxxxx>:

This is usually caused by use of older kernel clients.  I don't remember exactly what version it was fixed in, but iirc we've seen the problem with 3.14 and seen it go away with 3.18.

If your system is otherwise functioning well, this is not a critical error -- it just means that the MDS might not be able to fully control its memory usage (i.e. it can exceed mds_cache_size).

John

On 10/07/2015 05:25, 谷枫 wrote:

hi,

I use CephFS in production environnement with 7osd,1mds,3mon now.

So far so good,but i have a problem with it today.

The ceph status report this:

cluster ad3421a43-9fd4-4b7a-92ba-09asde3b1a228

      health HEALTH_WARN

             mds0: Client 34271 failing to respond to cache pressure

             mds0: Client 74175 failing to respond to cache pressure

             mds0: Client 74181 failing to respond to cache pressure

             mds0: Client 34247 failing to respond to cache pressure

             mds0: Client 64162 failing to respond to cache pressure

             mds0: Client 136744 failing to respond to cache pressure

      monmap e2: 3 mons at {node01=10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0  <http://10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0>}

             election epoch 186, quorum 0,1,2 node01,node02,node03

      mdsmap e46: 1/1/1 up {0=tree01=up:active}

      osdmap e717: 7 osds: 7 up, 7 in

       pgmap v995836: 264 pgs, 3 pools, 51544 MB data, 118 kobjects

             138 GB used, 1364 GB / 1502 GB avail

                  264 active+clean

   client io 1018 B/s rd, 1273 B/s wr, 0 op/s

I add two osds with the version 0.94.2 and other old osds is 0.94.1 yesterday.

So the question is does this matter?

What's the warning mean ,and how can i solve this problem.Thanks!

This is my cluster config message with mds:

     "name": "mds.tree01",

     "debug_mds": "1\/5",

     "debug_mds_balancer": "1\/5",

     "debug_mds_locker": "1\/5",

     "debug_mds_log": "1\/5",

     "debug_mds_log_expire": "1\/5",

     "debug_mds_migrator": "1\/5",

     "admin_socket": "\/var\/run\/ceph\/ceph-mds.tree01.asok",

     "log_file": "\/var\/log\/ceph\/ceph-mds.tree01.log",

     "keyring": "\/var\/lib\/ceph\/mds\/ceph-tree01\/keyring",

     "mon_max_mdsmap_epochs": "500",

     "mon_mds_force_trim_to": "0",

     "mon_debug_dump_location": "\/var\/log\/ceph\/ceph-mds.tree01.tdump",

     "client_use_random_mds": "false",

     "mds_data": "\/var\/lib\/ceph\/mds\/ceph-tree01",

     "mds_max_file_size": "1099511627776",

     "mds_cache_size": "100000",

     "mds_cache_mid": "0.7",

     "mds_max_file_recover": "32",

     "mds_mem_max": "1048576",

     "mds_dir_max_commit_size": "10",

     "mds_decay_halflife": "5",

     "mds_beacon_interval": "4",

     "mds_beacon_grace": "15",

     "mds_enforce_unique_name": "true",

     "mds_blacklist_interval": "1440",

     "mds_session_timeout": "120",

     "mds_revoke_cap_timeout": "60",

     "mds_recall_state_timeout": "60",

     "mds_freeze_tree_timeout": "30",

     "mds_session_autoclose": "600",

     "mds_health_summarize_threshold": "10",

     "mds_reconnect_timeout": "45",

     "mds_tick_interval": "5",

     "mds_dirstat_min_interval": "1",

     "mds_scatter_nudge_interval": "5",

     "mds_client_prealloc_inos": "1000",

     "mds_early_reply": "true",

     "mds_default_dir_hash": "2",

     "mds_log": "true",

     "mds_log_skip_corrupt_events": "false",

     "mds_log_max_events": "-1",

     "mds_log_events_per_segment": "1024",

     "mds_log_segment_size": "0",

     "mds_log_max_segments": "30",

     "mds_log_max_expiring": "20",

     "mds_bal_sample_interval": "3",

     "mds_bal_replicate_threshold": "8000",

     "mds_bal_unreplicate_threshold": "0",

     "mds_bal_frag": "false",

     "mds_bal_split_size": "10000",

     "mds_bal_split_rd": "25000",

     "mds_bal_split_wr": "10000",

     "mds_bal_split_bits": "3",

     "mds_bal_merge_size": "50",

     "mds_bal_merge_rd": "1000",

     "mds_bal_merge_wr": "1000",

     "mds_bal_interval": "10",

     "mds_bal_fragment_interval": "5",

     "mds_bal_idle_threshold": "0",

     "mds_bal_max": "-1",

     "mds_bal_max_until": "-1",

     "mds_bal_mode": "0",

     "mds_bal_min_rebalance": "0.1",

     "mds_bal_min_start": "0.2",

     "mds_bal_need_min": "0.8",

     "mds_bal_need_max": "1.2",

     "mds_bal_midchunk": "0.3",

     "mds_bal_minchunk": "0.001",

     "mds_bal_target_removal_min": "5",

     "mds_bal_target_removal_max": "10",

     "mds_replay_interval": "1",

     "mds_shutdown_check": "0",

     "mds_thrash_exports": "0",

     "mds_thrash_fragments": "0",

     "mds_dump_cache_on_map": "false",

     "mds_dump_cache_after_rejoin": "false",

     "mds_verify_scatter": "false",

     "mds_debug_scatterstat": "false",

     "mds_debug_frag": "false",

     "mds_debug_auth_pins": "false",

     "mds_debug_subtrees": "false",

     "mds_kill_mdstable_at": "0",

     "mds_kill_export_at": "0",

     "mds_kill_import_at": "0",

     "mds_kill_link_at": "0",

     "mds_kill_rename_at": "0",

     "mds_kill_openc_at": "0",

     "mds_kill_journal_at": "0",

     "mds_kill_journal_expire_at": "0",

     "mds_kill_journal_replay_at": "0",

     "mds_journal_format": "1",

     "mds_kill_create_at": "0",

     "mds_inject_traceless_reply_probability": "0",

     "mds_wipe_sessions": "false",

     "mds_wipe_ino_prealloc": "false",

     "mds_skip_ino": "0",

     "max_mds": "1",

     "mds_standby_for_name": "",

     "mds_standby_for_rank": "-1",

     "mds_standby_replay": "false",

     "mds_enable_op_tracker": "true",

     "mds_op_history_size": "20",

     "mds_op_history_duration": "600",

     "mds_op_complaint_time": "30",

     "mds_op_log_threshold": "5",

     "mds_snap_min_uid": "0",

     "mds_snap_max_uid": "65536",

     "mds_verify_backtrace": "1",

     "mds_action_on_write_error": "1",

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com