Re: OSD META usage growing without bounds

Igor Fedotov <igor.fedotov@xxxxxxxx> · Tue, 11 Jan 2022 14:59:21 +0300

And here is an overview from the PR 
(https://github.com/ceph/ceph/pull/35473) which looks to some degree 
inline with your initial points (cluster in an idle state for a long 
period):

"Original problem stemmed from BlueFS inability to replay log, which was 
caused by BlueFS previously wrote replay log that was corrupted, which 
was caused by BlueFS log growing to extreme size (~600GB), which was 
caused by OSD working in a way, when BlueFS::sync_metadata was never 
invoked."

Igor

On 1/11/2022 2:52 PM, Igor Fedotov wrote:
Frank,

btw - are you aware of https://tracker.ceph.com/issues/45903 ?

I can see it was rejected for mimic for whatever reason. Hence I 
presume that might be pretty relevant to your case...

Thanks,

Igor

On 1/11/2022 2:45 PM, Frank Schilder wrote:
Hi Igor,

thanks for your reply. To avoid further OSD fails, I shut down the 
cluster yesterday. Unfortunately, after restart all OSDs trimmed 
whatever was filling them up:

[root@rit-tceph ~]# ceph osd df tree
ID CLASS WEIGHT  REWEIGHT SIZE    USE     DATA    OMAP META     
AVAIL   %USE VAR  PGS TYPE NAME
-1       2.44707        - 2.4 TiB 9.2 GiB 255 MiB 25 KiB  9.0 GiB 2.4 
TiB 0.37 1.00   - root default
-5       0.81569        - 835 GiB 3.1 GiB  85 MiB 19 KiB  3.0 GiB 832 
GiB 0.37 1.00   -     host tceph-01
  0   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB 19 KiB 1024 MiB 
277 GiB 0.37 1.00 169         osd.0
  3   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 
277 GiB 0.37 1.00 164         osd.3
  8   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 
277 GiB 0.37 1.00 167         osd.8
-3       0.81569        - 835 GiB 3.1 GiB  85 MiB  3 KiB  3.0 GiB 832 
GiB 0.37 1.00   -     host tceph-02
  2   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 
277 GiB 0.37 1.00 157         osd.2
  4   hdd 0.27190  1.00000 278 GiB 1.0 GiB  29 MiB    0 B    1 GiB 
277 GiB 0.37 1.00 172         osd.4
  6   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB  3 KiB 1024 MiB 
277 GiB 0.37 1.00 171         osd.6
-7       0.81569        - 835 GiB 3.1 GiB  85 MiB  3 KiB  3.0 GiB 832 
GiB 0.37 1.00   -     host tceph-03
  1   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 
277 GiB 0.37 1.00 171         osd.1
  5   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB  3 KiB 1024 MiB 
277 GiB 0.37 1.00 160         osd.5
  7   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 
277 GiB 0.37 1.00 169         osd.7
                     TOTAL 2.4 TiB 9.2 GiB 255 MiB 25 KiB  9.0 GiB 
2.4 TiB 0.37
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

The OSDs didn't log what they were doing on startup. The log goes 
straight from bluefs init to PG scrub messages (with a very long wait 
time in between). Iostat showed very heavy read on the drives during 
the trimming/boot phase.

I'm not sure if it helps to collect perf counters already now. I will 
wait until I see some unusual growth in META again. I don't think the 
problem is there from the start, it looked more like the OSDs started 
filling meta up independently at different times. I will let the 
cluster sit idle as before and keep watching. Hope I find something.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Igor Fedotov <igor.fedotov@xxxxxxxx>
Sent: 11 January 2022 10:27:14
To: Frank Schilder; ceph-users
Subject: Re:  OSD META usage growing without bounds

Hi Frank,

you might want to collect a couple of perf dumps for osd in question in
e.g. one hour interval. And inspect what counters are growing in bluefs
sections. "log_bytes" is of particular interest...

Thanks,

Igor

On 1/10/2022 2:25 PM, Frank Schilder wrote:
Hi, I'm observing a strange behaviour on a small test cluster 
(13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)). 
The cluster is up for about half a year and almost empty. We did a 
few rbd bench runs and created a file system, but there was zero 
client IO for at least 3 months. It looks like recently the OSD META 
usage of some OSDs started to increase for no apparent reason. One 
OSD already died with 100% usage and another is on its way. I can't 
see any obvious reason for this strange behaviour.

If anyone has an idea, please let me know.

Some diagnostic output:

[root@rit-tceph ~]# ceph status
    cluster:
      id:     bf1f51f5-b381-4cf7-b3db-88d044c1960c
      health: HEALTH_WARN
              1 nearfull osd(s)
              3 pool(s) nearfull

    services:
      mon: 3 daemons, quorum tceph-01,tceph-02,tceph-03
      mgr: tceph-01(active), standbys: tceph-02, tceph-03
      mds: testfs-1/1/1 up  {0=tceph-01=up:active}, 2 up:standby
      osd: 9 osds: 8 up, 8 in

    data:
      pools:   3 pools, 500 pgs
      objects: 24  objects, 2.3 KiB
      usage:   746 GiB used, 1.4 TiB / 2.2 TiB avail
      pgs:     500 active+clean

[root@rit-tceph ~]# ceph df
GLOBAL:
      SIZE        AVAIL       RAW USED     %RAW USED
      2.2 TiB     1.4 TiB      746 GiB         33.49
POOLS:
      NAME                ID     USED        %USED     MAX AVAIL     
OBJECTS
      test                1         19 B         0        81 
GiB           2
      testfs_data         2          0 B         0        81 
GiB           0
      testfs_metadata     3      2.2 KiB         0        81 
GiB          22

[root@rit-tceph ~]# ceph osd df tree
ID CLASS WEIGHT  REWEIGHT SIZE    USE     DATA    OMAP META    
AVAIL   %USE  VAR  PGS TYPE NAME
-1       2.44707        - 2.2 TiB 746 GiB 120 MiB 34 KiB 746 GiB 1.4 
TiB 33.49 1.00   - root default
-5       0.81569        - 557 GiB 195 GiB  30 MiB  3 KiB 195 GiB 362 
GiB 35.04 1.05   -     host tceph-01
   0   hdd 0.27190  1.00000 278 GiB  38 GiB  15 MiB  3 KiB  38 GiB 
241 GiB 13.61 0.41 260         osd.0
   3   hdd 0.27190        0     0 B     0 B     0 B    0 B 0 B     0 
B     0    0   0         osd.3
   8   hdd 0.27190  1.00000 278 GiB 157 GiB  15 MiB    0 B 157 GiB 
121 GiB 56.47 1.69 240         osd.8
-3       0.81569        - 835 GiB 113 GiB  45 MiB  3 KiB 113 GiB 723 
GiB 13.48 0.40   -     host tceph-02
   2   hdd 0.27190  1.00000 278 GiB  18 GiB  15 MiB    0 B  18 GiB 
261 GiB  6.30 0.19 157         osd.2
   4   hdd 0.27190  1.00000 278 GiB  48 GiB  15 MiB    0 B  48 GiB 
231 GiB 17.21 0.51 172         osd.4
   6   hdd 0.27190  1.00000 278 GiB  47 GiB  15 MiB  3 KiB  47 GiB 
231 GiB 16.93 0.51 171         osd.6
-7       0.81569        - 835 GiB 438 GiB  45 MiB 28 KiB 438 GiB 397 
GiB 52.48 1.57   -     host tceph-03
   1   hdd 0.27190  1.00000 278 GiB 238 GiB  15 MiB 25 KiB 238 GiB  
41 GiB 85.35 2.55 171         osd.1
   5   hdd 0.27190  1.00000 278 GiB 200 GiB  15 MiB  3 KiB 200 GiB  
79 GiB 71.68 2.14 160         osd.5
   7   hdd 0.27190  1.00000 278 GiB 1.1 GiB  15 MiB    0 B 1.1 GiB 
277 GiB  0.40 0.01 169         osd.7
                      TOTAL 2.2 TiB 746 GiB 120 MiB 34 KiB 746 GiB 
1.4 TiB 33.49
MIN/MAX VAR: 0.01/2.55  STDDEV: 30.50

2 hours later:

[root@rit-tceph ~]# ceph status
    cluster:
      id:     bf1f51f5-b381-4cf7-b3db-88d044c1960c
      health: HEALTH_WARN
              1 nearfull osd(s)
              3 pool(s) nearfull

    services:
      mon: 3 daemons, quorum tceph-01,tceph-02,tceph-03
      mgr: tceph-01(active), standbys: tceph-02, tceph-03
      mds: testfs-1/1/1 up  {0=tceph-01=up:active}, 2 up:standby
      osd: 9 osds: 8 up, 8 in

    data:
      pools:   3 pools, 500 pgs
      objects: 24  objects, 2.3 KiB
      usage:   748 GiB used, 1.4 TiB / 2.2 TiB avail
      pgs:     500 active+clean

The usage is increasing surprisingly fast.

Thanks for any pointers!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx