ceph df reports incorrect stats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear fellow cephers,

we got a problem with ceph df: ceph df reports incorrect USED. It would be great if someone could look at this, if a ceph operator doesn't discover this issue, they might run out of space without noticing.

This has been reported before but didn't get much attention:

https://www.spinics.net/lists/ceph-users/msg74602.html
https://www.spinics.net/lists/ceph-users/msg74630.html

The symptom: STORED=USED in output of ceph df. All reports I know of are for octopus clusters, but I suspect newer versions are affected as well. I don't have a reproducer yet (still lacking a test cluster).

Here is a correct usage report:

==> logs/health_231203.log <==
--- RAW STORAGE ---
CLASS     SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd        13 PiB  7.8 PiB  4.8 PiB   4.8 PiB      38.29
 
--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
con-fs2-data           14  2048  1.1 PiB  402.93M  1.2 PiB  20.95    3.7 PiB
con-fs2-data2          19  8192  2.7 PiB    1.10G  3.4 PiB  42.78    3.3 PiB


Here is an incorrect one:

==> logs/health_231204.log <==
--- RAW STORAGE ---
CLASS     SIZE     AVAIL    USED     RAW USED  %RAW USED
hdd        13 PiB  7.8 PiB  4.8 PiB   4.8 PiB      38.06
 
--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
con-fs2-data           14  2048  1.1 PiB  402.93M  1.1 PiB  18.82    3.6 PiB
con-fs2-data2          19  8192  2.7 PiB    1.10G  2.7 PiB  37.09    3.3 PiB


That the first report is correct and not the second is supported by the output of ceph osd df tree, showing a use of 4.6PB in alignment with the first output of ceph df. Note that the date of the ceph osd df tree output is identical to the date of the incorrect ceph df output, hence, ceph osd df tree is *not* affected by this issue:

==> ceph osd df tree 231204 <===
SIZE      RAW USE  DATA     OMAP     META     AVAIL    NAME                         
  12 PiB  4.6 PiB  4.6 PiB  2.2 TiB   19 TiB  7.5 PiB  datacenter ContainerSquare
   0 B      0 B      0 B      0 B      0 B      0 B      room CON-161-A        
  12 PiB  4.6 PiB  4.6 PiB  2.2 TiB   19 TiB  7.5 PiB      room CON-161-A1       


In our case, the problem showed up out of nowhere. Here the log snippet for the time window within which the flip happened (compare the lines for con-fs2-data?-pools):

==> logs/health_231203.log <==
ceph status/df/pool stats/health detail at 16:30:03:
  cluster:
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 3M)
    mgr: ceph-25(active, since 2M), standbys: ceph-26, ceph-01, ceph-03, ceph-02
    mds: con-fs2:8 4 up:standby 8 up:active
    osd: 1284 osds: 1279 up (since 14h), 1279 in (since 2w)
 
  task status:
 
  data:
    pools:   14 pools, 25065 pgs
    objects: 2.23G objects, 4.0 PiB
    usage:   5.0 PiB used, 8.1 PiB / 13 PiB avail
    pgs:     25035 active+clean
             29    active+clean+scrubbing+deep
             1     active+clean+scrubbing
 
  io:
    client:   215 MiB/s rd, 140 MiB/s wr, 2.34k op/s rd, 1.89k op/s wr
 
--- RAW STORAGE ---
CLASS     SIZE     AVAIL    USED     RAW USED  %RAW USED
fs_meta    51 TiB   45 TiB  831 GiB   6.0 TiB      11.84
hdd        13 PiB  7.8 PiB  4.8 PiB   4.8 PiB      38.08
rbd_data  283 TiB  171 TiB  111 TiB   112 TiB      39.44
rbd_perf   42 TiB   22 TiB   20 TiB    20 TiB      48.60
TOTAL      13 PiB  8.1 PiB  4.9 PiB   5.0 PiB      38.04
 
--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
sr-rbd-meta-one         1   128   13 GiB   16.57k   38 GiB   0.03     39 TiB
sr-rbd-data-one         2  4096  121 TiB   32.06M  108 TiB  48.08     88 TiB
sr-rbd-one-stretch      3   160  262 GiB   68.81k  573 GiB   0.48     39 TiB
con-rbd-meta-hpc-one    7    50   12 KiB       45  372 KiB      0    9.2 TiB
con-rbd-data-hpc-one    8   150   24 GiB    6.10k   24 GiB      0    3.6 PiB
sr-rbd-data-one-hdd    11  1024  137 TiB   35.95M  193 TiB  46.57    166 TiB
con-fs2-meta1          12   512  554 GiB   76.76M  2.2 TiB   7.26    6.9 TiB
con-fs2-meta2          13  4096      0 B  574.23M      0 B      0    6.9 TiB
con-fs2-data           14  2048  1.1 PiB  402.93M  1.2 PiB  21.09    3.6 PiB
con-fs2-data-ec-ssd    17   256  700 GiB    7.27M  706 GiB   2.44     22 TiB
ms-rbd-one             18   256  805 GiB  210.92k  1.4 TiB   1.18     39 TiB
con-fs2-data2          19  8192  2.7 PiB    1.10G  3.4 PiB  42.96    3.3 PiB
sr-rbd-data-one-perf   20  4096  6.8 TiB    1.81M   20 TiB  57.09    5.1 TiB
device_health_metrics  21     1  1.4 GiB    1.11k  4.2 GiB      0     39 TiB


ceph status/df/pool stats/health detail at 16:30:10:
  cluster:
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 3M)
    mgr: ceph-25(active, since 2M), standbys: ceph-26, ceph-01, ceph-03, ceph-02
    mds: con-fs2:8 4 up:standby 8 up:active
    osd: 1284 osds: 1279 up (since 14h), 1279 in (since 2w)
 
  task status:
 
  data:
    pools:   14 pools, 25065 pgs
    objects: 2.23G objects, 4.0 PiB
    usage:   5.0 PiB used, 8.1 PiB / 13 PiB avail
    pgs:     25035 active+clean
             29    active+clean+scrubbing+deep
             1     active+clean+scrubbing
 
  io:
    client:   241 MiB/s rd, 174 MiB/s wr, 2.68k op/s rd, 2.34k op/s wr
 
--- RAW STORAGE ---
CLASS     SIZE     AVAIL    USED     RAW USED  %RAW USED
fs_meta    51 TiB   45 TiB  830 GiB   6.0 TiB      11.84
hdd        13 PiB  7.8 PiB  4.8 PiB   4.8 PiB      38.08
rbd_data  283 TiB  171 TiB  111 TiB   112 TiB      39.44
rbd_perf   42 TiB   22 TiB   20 TiB    20 TiB      48.60
TOTAL      13 PiB  8.1 PiB  4.9 PiB   5.0 PiB      38.04
 
--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
sr-rbd-meta-one         1   128   13 GiB   16.57k   13 GiB   0.01     39 TiB
sr-rbd-data-one         2  4096   92 TiB   32.06M   92 TiB  44.11     88 TiB
sr-rbd-one-stretch      3   160  222 GiB   68.81k  222 GiB   0.19     39 TiB
con-rbd-meta-hpc-one    7    50  6.9 KiB       45  6.9 KiB      0    9.2 TiB
con-rbd-data-hpc-one    8   150   23 GiB    6.10k   23 GiB      0    3.6 PiB
sr-rbd-data-one-hdd    11  1024  135 TiB   35.95M  135 TiB  37.88    166 TiB
con-fs2-meta1          12   512  367 GiB   76.76M  367 GiB   1.28    6.9 TiB
con-fs2-meta2          13  4096      0 B  574.23M      0 B      0    6.9 TiB
con-fs2-data           14  2048  1.1 PiB  402.93M  1.1 PiB  18.82    3.6 PiB
con-fs2-data-ec-ssd    17   256  515 GiB    7.27M  515 GiB   1.79     22 TiB
ms-rbd-one             18   256  579 GiB  210.92k  579 GiB   0.48     39 TiB
con-fs2-data2          19  8192  2.7 PiB    1.10G  2.7 PiB  37.09    3.3 PiB
sr-rbd-data-one-perf   20  4096  6.9 TiB    1.81M  6.9 TiB  31.29    5.1 TiB
device_health_metrics  21     1  1.2 GiB    1.11k  1.2 GiB      0     39 TiB

For us, the issue disappeared after taking some OSDs in a second root down. These OSDs were moved there for draining, we use a second crush root for this purpose. Here the log snippet with the time window within which the back-flip to correct reporting happened:

==> logs/health_231205.log <==
ceph status/df/pool stats/health detail at 17:42:58:
  cluster:
    health: HEALTH_WARN
            1 osds down
            24 hosts (12 osds) down
            1 root (12 osds) down
 
  services:
    mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 3M)
    mgr: ceph-25(active, since 2M), standbys: ceph-26, ceph-01, ceph-03, ceph-02
    mds: con-fs2:8 4 up:standby 8 up:active
    osd: 1284 osds: 1267 up (since 19m), 1268 in (since 0.401448s)
 
  task status:
 
  data:
    pools:   14 pools, 25065 pgs
    objects: 2.23G objects, 4.0 PiB
    usage:   5.0 PiB used, 8.0 PiB / 13 PiB avail
    pgs:     25034 active+clean
             31    active+clean+scrubbing+deep
 
  io:
    client:   118 MiB/s rd, 789 MiB/s wr, 1.75k op/s rd, 2.14k op/s wr
 
--- RAW STORAGE ---
CLASS     SIZE     AVAIL    USED     RAW USED  %RAW USED
fs_meta    51 TiB   45 TiB  731 GiB   5.9 TiB      11.65
hdd        13 PiB  7.8 PiB  4.8 PiB   4.8 PiB      38.36
rbd_data  283 TiB  171 TiB  111 TiB   112 TiB      39.59
rbd_perf   42 TiB   22 TiB   20 TiB    20 TiB      48.19
TOTAL      13 PiB  8.0 PiB  4.9 PiB   5.0 PiB      38.32
 
--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
sr-rbd-meta-one         1   128   14 GiB   16.94k   14 GiB   0.01     39 TiB
sr-rbd-data-one         2  4096   93 TiB   32.32M   93 TiB  44.29     88 TiB
sr-rbd-one-stretch      3   160  222 GiB   68.81k  222 GiB   0.19     39 TiB
con-rbd-meta-hpc-one    7    50  6.9 KiB       45  6.9 KiB      0    9.2 TiB
con-rbd-data-hpc-one    8   150   23 GiB    6.10k   23 GiB      0    3.6 PiB
sr-rbd-data-one-hdd    11  1024  135 TiB   36.08M  135 TiB  38.00    165 TiB
con-fs2-meta1          12   512  367 GiB   76.81M  367 GiB   1.28    6.9 TiB
con-fs2-meta2          13  4096      0 B  572.65M      0 B      0    6.9 TiB
con-fs2-data           14  2048  1.1 PiB  402.93M  1.1 PiB  18.83    3.6 PiB
con-fs2-data-ec-ssd    17   256  515 GiB    7.27M  515 GiB   1.78     22 TiB
ms-rbd-one             18   256  579 GiB  210.92k  579 GiB   0.48     39 TiB
con-fs2-data2          19  8192  2.7 PiB    1.10G  2.7 PiB  37.16    3.3 PiB
sr-rbd-data-one-perf   20  4096  6.9 TiB    1.81M  6.9 TiB  31.07    5.1 TiB
device_health_metrics  21     1  1.2 GiB    1.11k  1.2 GiB      0     39 TiB


ceph status/df/pool stats/health detail at 17:43:04:
  cluster:
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 3M)
    mgr: ceph-25(active, since 2M), standbys: ceph-26, ceph-01, ceph-03, ceph-02
    mds: con-fs2:8 4 up:standby 8 up:active
    osd: 1284 osds: 1267 up (since 19m), 1267 in (since 6s)
 
  task status:
 
  data:
    pools:   14 pools, 25065 pgs
    objects: 2.23G objects, 4.0 PiB
    usage:   5.0 PiB used, 8.0 PiB / 13 PiB avail
    pgs:     25035 active+clean
             30    active+clean+scrubbing+deep
 
  io:
    client:   151 MiB/s rd, 840 MiB/s wr, 2.13k op/s rd, 2.10k op/s wr
 
--- RAW STORAGE ---
CLASS     SIZE     AVAIL    USED     RAW USED  %RAW USED
fs_meta    51 TiB   45 TiB  731 GiB   5.9 TiB      11.65
hdd        13 PiB  7.7 PiB  4.8 PiB   4.8 PiB      38.42
rbd_data  283 TiB  171 TiB  111 TiB   112 TiB      39.59
rbd_perf   42 TiB   22 TiB   20 TiB    20 TiB      48.19
TOTAL      13 PiB  8.0 PiB  4.9 PiB   5.0 PiB      38.37
 
--- POOLS ---
POOL                   ID  PGS   STORED   OBJECTS  USED     %USED  MAX AVAIL
sr-rbd-meta-one         1   128   14 GiB   16.94k   42 GiB   0.04     39 TiB
sr-rbd-data-one         2  4096  122 TiB   32.32M  109 TiB  48.26     88 TiB
sr-rbd-one-stretch      3   160  262 GiB   68.81k  573 GiB   0.48     39 TiB
con-rbd-meta-hpc-one    7    50   11 KiB       45  368 KiB      0    9.2 TiB
con-rbd-data-hpc-one    8   150   24 GiB    6.10k   24 GiB      0    3.6 PiB
sr-rbd-data-one-hdd    11  1024  138 TiB   36.08M  193 TiB  46.69    165 TiB
con-fs2-meta1          12   512  555 GiB   76.81M  2.2 TiB   7.26    6.9 TiB
con-fs2-meta2          13  4096      0 B  572.65M      0 B      0    6.9 TiB
con-fs2-data           14  2048  1.1 PiB  402.93M  1.2 PiB  21.09    3.6 PiB
con-fs2-data-ec-ssd    17   256  700 GiB    7.27M  706 GiB   2.43     22 TiB
ms-rbd-one             18   256  805 GiB  210.92k  1.4 TiB   1.18     39 TiB
con-fs2-data2          19  8192  2.7 PiB    1.10G  3.4 PiB  43.01    3.3 PiB
sr-rbd-data-one-perf   20  4096  6.8 TiB    1.81M   20 TiB  56.75    5.1 TiB
device_health_metrics  21     1  1.4 GiB    1.11k  4.2 GiB      0     39 TiB

This observation leads me to suspect that having multiple crush roots might be a reason for this observation. Our crush tree looks like this (OSDs removed), it has 3 different roots (BB, DTU and default):

ID    CLASS     WEIGHT       TYPE NAME                           STATUS  REWEIGHT  PRI-AFF
 -78              106.92188  root BB                                                      
 -99                      0      host bb-04                                               
-102                      0      host bb-05                                               
-105                      0      host bb-06                                               
-325                      0      host bb-06-old                                           
-108                      0      host bb-07                                               
-331                      0      host bb-07-old                                           
  -3                8.91016      host bb-08                                               
  -9                8.91016      host bb-09                                               
 -18                8.91016      host bb-10                                               
 -21                8.91016      host bb-11                                               
 -28                8.91016      host bb-12                                               
 -34                8.91016      host bb-13                                               
 -72                8.91016      host bb-14                                               
 -75                8.91016      host bb-15                                               
-111                8.91016      host bb-16                                               
-114                8.91016      host bb-17                                               
-117                      0      host bb-18                                               
-142                      0      host bb-19                                               
-145                      0      host bb-20                                               
-241                      0      host bb-21                                               
-246                      0      host bb-22                                               
-251                8.91016      host bb-23                                               
-256                8.91016      host bb-24                                               
-151                      0      host bb-office                                           
 -40            14614.77832  root DTU                                                     
 -42                      0      region Lyngby                                            
 -41            14614.77832      region Risoe                                             
 -50            12843.79590          datacenter ContainerSquare                           
 -56                      0              room CON-161-A                                   
 -57            12843.79590              room CON-161-A1                                  
 -11             1092.49060                  host ceph-08                                 
 -13             1074.27673                  host ceph-09                                 
 -23             1075.67920                  host ceph-10                                 
 -15             1067.16492                  host ceph-11                                 
 -25             1080.21912                  host ceph-12                                 
 -83             1061.17480                  host ceph-13                                 
 -85             1047.70276                  host ceph-14                                 
 -87             1079.02820                  host ceph-15                                 
-136             1012.55048                  host ceph-16                                 
-139             1073.61475                  host ceph-17                                 
-261             1125.57202                  host ceph-23                                 
-262             1054.32227                  host ceph-24                                 
-148              885.49133          datacenter MultiSite                                 
 -65               86.16304              host ceph-04                                     
 -67              101.50623              host ceph-05                                     
 -69              104.85805              host ceph-06                                     
 -71               96.39923              host ceph-07                                     
 -81               97.54230              host ceph-18                                     
 -94               98.48271              host ceph-19                                     
  -4               97.20181              host ceph-20                                     
 -64               99.77657              host ceph-21                                     
 -66              103.56137              host ceph-22                                     
 -49              885.49133          datacenter ServerRoom                                
 -55              885.49133              room SR-113                                      
 -65               86.16304                  host ceph-04                                 
 -67              101.50623                  host ceph-05                                 
 -69              104.85805                  host ceph-06                                 
 -71               96.39923                  host ceph-07                                 
 -81               97.54230                  host ceph-18                                 
 -94               98.48271                  host ceph-19                                 
  -4               97.20181                  host ceph-20                                 
 -64               99.77657                  host ceph-21                                 
 -66              103.56137                  host ceph-22                                 
  -1                      0  root default

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux