High CPU utilization and inexplicably slow I/O requests We have been having similar performance issues across several ceph clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK for a while, but eventually performance worsens and becomes (at first intermittently, but eventually continually) HEALTH_WARN due to slow I/O request blocked for longer than 32 sec. These slow requests are accompanied by "currently waiting for rw locks", but we have not found any network issue that normally is responsible for this warning. Examining the individual slow OSDs from `ceph health detail` has been unproductive; there don't seem to be any slow disks and if we stop the OSD the problem just moves somewhere else. We also think this trends with increased number of RBDs on the clusters, but not necessarily a ton of Ceph I/O. At the same time, user %CPU time spikes up to 95-100%, at first frequently and then consistently, simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz CPU with 6 cores and 64GiB RAM per node. ceph1 ~ $ sudo ceph status cluster XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX health HEALTH_WARN 547 requests are blocked > 32 sec monmap e1: 3 mons at {cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XX:XXXX/0,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0} election epoch 16, quorum 0,1,2 cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX osdmap e577122: 72 osds: 68 up, 68 in flags sortbitwise,require_jewel_osds pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091 kobjects 126 TB used, 368 TB / 494 TB avail 4084 active+clean 12 active+clean+scrubbing+deep client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr ceph1 ~ $ vmstat 5 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 27 1 0 3112660 165544 36261692 0 0 472 1274 0 1 22 1 76 1 0 25 0 0 3126176 165544 36246508 0 0 858 12692 12122 110478 97 2 1 0 0 22 0 0 3114284 165544 36258136 0 0 1 6118 9586 118625 97 2 1 0 0 11 0 0 3096508 165544 36276244 0 0 8 6762 10047 188618 89 3 8 0 0 18 0 0 2990452 165544 36384048 0 0 1209 21170 11179 179878 85 4 11 0 0 There is no apparent memory shortage, and none of the HDDs or SSDs show consistently high utilization, slow service times, or any other form of hardware saturation, other than user CPU utilization. Can CPU starvation be responsible for "waiting for rw locks"? Our main pool (the one with all the data) currently has 1024 PGs, leaving us room to add more PGs if needed, but we're concerned if we do so that we'd consume even more CPU. We have moved to running Ceph + jemalloc instead of tcmalloc, and that has helped with CPU utilization somewhat, but we still see occurences of 95-100% CPU with not terribly high Ceph workload. Any suggestions of what else to look at? We have a peculiar use case where we have many RBDs but only about 1-5% of them are active at the same time, and we're constantly making and expiring RBD snapshots. Could this lead to aberrant performance? For instance, is it normal to have ~40k snaps still in cached_removed_snaps?
[global] cluster = XXXXXXXX fsid = XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX keyring = /etc/ceph/ceph.keyring auth_cluster_required = none auth_service_required = none auth_client_required = none mon_host = cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX mon_addr = XXX.XXX.XXX.XXX:XXXX,XXX.XXX.XXX.XXX:XXX,XXX.XXX.XXX.XXX:XXXX mon_initial_members = cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX cluster_network = 172.20.0.0/18 public_network = XXX.XXX.XXX.XXX/20 mon osd full ratio = .80 mon osd nearfull ratio = .60 rbd default format = 2 rbd default order = 25 rbd_default_features = 1 osd pool default size = 3 osd pool default min size = 1 osd pool default pg num = 1024 osd pool default pgp num = 1024 osd_recovery_op_priority = 1 osd_max_backfills = 1 osd_recovery_threads = 1 osd_recovery_max_active = 1 osd_recovery_max_single_start = 1 osd_scrub_thread_suicide_timeout = 300 osd scrub during recovery = false osd scrub sleep = 60
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com