Re: [EXTERN] Please help collecting stats of Ceph monitor disk writes

Dietmar Rieder <dietmar.rieder@xxxxxxxxxxx> · Fri, 13 Oct 2023 10:18:15 +0200

Hi,

this is on our nautilus cluster, not sure if it is relevant, however 
here are the results:

1) iotop results:
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
   1801 be/4 ceph          0.00 B    108.10 M  0.00 %  0.02 % ceph-mon 
-f --cluster ceph --id cephmon-01 --setuser ceph --setgroup ceph 
[rocksdb:low0]
   1840 be/4 ceph          0.00 B      3.95 M  0.00 %  0.00 % ceph-mon 
-f --cluster ceph --id cephmon-01 --setuser ceph --setgroup ceph 
[fn_monstore]
   1859 be/4 ceph          0.00 B      3.89 M  0.00 %  0.00 % ceph-mon 
-f --cluster ceph --id cephmon-01 --setuser ceph --setgroup ceph 
[safe_timer]
   1802 be/4 ceph          0.00 B      2.82 M  0.00 %  0.00 % ceph-mon 
-f --cluster ceph --id cephmon-01 --setuser ceph --setgroup ceph 
[rocksdb:high0]
   1742 be/4 ceph          0.00 B     56.00 K  0.00 %  0.00 % ceph-mon 
-f --cluster ceph --id cephmon-01 --setuser ceph --setgroup ceph [log]

2) manual compactions:
Fri Oct 13 10:11:59 CEST 2023
127

3) monitor store.db size:
165M    /var/lib/ceph/mon/ceph-cephmon-01/store.db/

4) cluster version and status:
ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus 
(stable)
  cluster:
    id:     43587260-09d9-4f3b-b118-365b5fc8ab64
    health: HEALTH_WARN
            1 clients failing to respond to cache pressure

  services:
    mon: 3 daemons, quorum cephmon-01,cephmon-02,cephmon-03 (age 6M)
    mgr: cephmon-02(active, since 15M), standbys: cephmon-03, cephmon-01
    mds: cephfs:1 {0=cephmds-02=up:active} 1 up:standby-replay 1 up:standby
    osd: 240 osds: 240 up (since 6d), 240 in (since 6d)

  data:
    pools:   3 pools, 5184 pgs
    objects: 276.38M objects, 719 TiB
    usage:   1.1 PiB used, 498 TiB / 1.6 PiB avail
    pgs:     5175 active+clean
             9    active+clean+scrubbing+deep

  io:
    client:   15 MiB/s rd, 1.8 MiB/s wr, 50 op/s rd, 3 op/s wr

On 10/13/23 08:58, Zakhar Kirpichenko wrote:
Hi!

Further to my thread "Ceph 16.2.x mon compactions, disk writes" (
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/XGCI2LFW5RH3GUOQFJ542ISCSZH3FRX2/)
where we have established that Ceph monitors indeed write considerable
amounts of data to disks, I would like to request fellow Ceph users to
provide feedback and help gather some statistics regarding whether this
happens on all clusters or on some specific subset of clusters.

The procedure is rather simple and won't take much of your time.

If you are willing to help, please follow this procedure:

---------

1. Install iotop and run the following command on any of your monitor nodes:

iotop -ao -bn 2 -d 300 2>&1 | grep -E "TID|ceph-mon"

This will collect a 5-minute disk I/O statistics and produce an output
containing the stats for Ceph monitor threads running on the node:

     TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
     TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
    4854 be/4 167           8.62 M      2.27 G  0.00 %  0.72 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
  --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:low0]
    4919 be/4 167           0.00 B     39.43 M  0.00 %  0.02 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
  --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [ms_dispatch]
    4855 be/4 167           8.00 K     19.55 M  0.00 %  0.00 % ceph-mon -n
mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false
--default-log-to-stderr=true --default-log-stderr-prefix=debug
  --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-stderr=true [rocksdb:high0]

We're particularly interested in the amount of written data.

---------

2. Optional: collect the number of "manual compaction" events from the
monitor.

This step will depend on how your monitor runs. My cluster is managed by
cephadm and monitors run in docker containers, thus I can do something like
this, where MYMONCONTAINERID is the container ID of Ceph monitor:

# date; d=$(date +'%Y-%m-%d'); docker logs MYMONCONTAINERID 2>&1 | grep $d
| grep -ci "manual compaction from"
Fri 13 Oct 2023 06:29:39 AM UTC
580

Alternatively, I could run the command against the log file MYMONLOGFILE,
whose location I obtained with docker inspect:

# date; d=$(date +'%Y-%m-%d'); grep $d MYMONLOGFILE | grep -ci "manual
compaction from"
Fri 13 Oct 2023 06:35:27 AM UTC
588

If you run monitors with podman or without containerization, please get
this information the way that is most convenient in your setup.

---------

3. Optional: collect the monitor store.db size.

Usually the monitor store.db is available at
/var/lib/ceph/FSID/mon.NAME/store.db/, for example:

# du -hs
/var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/
642M
  /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db/

---------

4. Optional: collect Ceph cluster version and status.

For example:

root@ceph01:/# ceph version; ceph -s
ceph version 16.2.14 (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific
(stable)
   cluster:
     id:     3f50555a-ae2a-11eb-a2fc-ffde44714d86
     health: HEALTH_OK

   services:
     mon: 5 daemons, quorum ceph01,ceph03,ceph04,ceph05,ceph02 (age 2w)
     mgr: ceph01.vankui(active, since 13d), standbys: ceph02.shsinf
     osd: 96 osds: 96 up (since 2w), 95 in (since 3w)

   data:
     pools:   10 pools, 2400 pgs
     objects: 6.30M objects, 16 TiB
     usage:   61 TiB used, 716 TiB / 777 TiB avail
     pgs:     2396 active+clean
              3    active+clean+scrubbing+deep
              1    active+clean+scrubbing

   io:
     client:   71 MiB/s rd, 60 MiB/s wr, 2.94k op/s rd, 2.56k op/s wr

---------

5. Reply to this thread and submit the collected information.

For example:

1) iotop results:
... Paste data obtained in step 1)

2) manual compactions:
... Paste data obtained in step 2), or put "N/A"

3) monitor store.db size:
... Paste data obtained in step 3), or put "N/A"

4) cluster version and status:
... Paste data obtained in step 4), or put "N/A"

-------------

I would very much appreciate your effort and help with gathering these
stats. Please don't hesitate to contact me with any questions or concerns.

Best regards,

Zakhar
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
_________________________________________________________
D i e t m a r  R i e d e r, Mag.Dr.
Head of Bioinformatics Core Facility
Innsbruck Medical University
Biocenter - Institute of Bioinformatics
Innrain 80, 6020 Innsbruck
Phone: +43 512 9003 71402 | Mobile: +43 676 8716 72402
Email: dietmar.rieder@xxxxxxxxxxx
Web:   http://www.icbi.at

Attachment:
OpenPGP_signature

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx