Please help collecting stats of Ceph monitor disk writes

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Fri, 13 Oct 2023 09:58:39 +0300

Hi!

Further to my thread "Ceph 16.2.x mon compactions, disk writes" (
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/XGCI2LFW5RH3GUOQFJ542ISCSZH3FRX2/)
where we have established that Ceph monitors indeed write considerable
amounts of data to disks, I would like to request fellow Ceph users to
provide feedback and help gather some statistics regarding whether this
happens on all clusters or on some specific subset of clusters.

The procedure is rather simple and won't take much of your time.

If you are willing to help, please follow this procedure:

---------

1. Install iotop and run the following command on any of your monitor nodes:

iotop -ao -bn 2 -d 300 2>&1 | grep -E "TID|ceph-mon"

This will collect a 5-minute disk I/O statistics and produce an output
containing the stats for Ceph monitor threads running on the node:

    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
   4854 be/4 167           8.62 M      2.27 G  0.00 %  0.72 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]
   4919 be/4 167           0.00 B     39.43 M  0.00 %  0.02 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [ms_dispatch]
   4855 be/4 167           8.00 K     19.55 M  0.00 %  0.00 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
 --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:high0]

We're particularly interested in the amount of written data.

---------

2. Optional: collect the number of "manual compaction" events from the
monitor.

This step will depend on how your monitor runs. My cluster is managed by
cephadm and monitors run in docker containers, thus I can do something like
this, where MYMONCONTAINERID is the container ID of Ceph monitor:

# date; d=$(date +'%Y-%m-%d'); docker logs MYMONCONTAINERID 2>&1 | grep $d
| grep -ci "manual compaction from"
Fri 13 Oct 2023 06:29:39 AM UTC
580

Alternatively, I could run the command against the log file MYMONLOGFILE,
whose location I obtained with docker inspect:

# date; d=$(date +'%Y-%m-%d'); grep $d MYMONLOGFILE | grep -ci "manual
compaction from"
Fri 13 Oct 2023 06:35:27 AM UTC
588

If you run monitors with podman or without containerization, please get
this information the way that is most convenient in your setup.

---------

3. Optional: collect the monitor store.db size.

Usually the monitor store.db is available at
/var/lib/ceph/FSID/mon.NAME/store.db/, for example:

# du -hs
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/
642M
 /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/

---------

4. Optional: collect Ceph cluster version and status.

For example:

root@ceph01:/# ceph version; ceph -s
ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific
(stable)
  cluster:
    id:     3f50555a-ae2a-11eb-a2fc-ffde44714d86
    health: HEALTH_OK

  services:
    mon: 5 daemons, quorum ceph01,ceph03,ceph04,ceph05,ceph02 (age 2w)
    mgr: ceph01.vankui(active, since 13d), standbys: ceph02.shsinf
    osd: 96 osds: 96 up (since 2w), 95 in (since 3w)

  data:
    pools:   10 pools, 2400 pgs
    objects: 6.30M objects, 16 TiB
    usage:   61 TiB used, 716 TiB / 777 TiB avail
    pgs:     2396 active+clean
             3    active+clean+scrubbing+deep
             1    active+clean+scrubbing

  io:
    client:   71 MiB/s rd, 60 MiB/s wr, 2.94k op/s rd, 2.56k op/s wr

---------

5. Reply to this thread and submit the collected information.

For example:

1) iotop results:
... Paste data obtained in step 1)

2) manual compactions:
... Paste data obtained in step 2), or put "N/A"

3) monitor store.db size:
... Paste data obtained in step 3), or put "N/A"

4) cluster version and status:
... Paste data obtained in step 4), or put "N/A"

-------------

I would very much appreciate your effort and help with gathering these
stats. Please don't hesitate to contact me with any questions or concerns.

Best regards,

Zakhar
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx