Re: OSD META usage growing without bounds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



And here is an overview from the PR (https://github.com/ceph/ceph/pull/35473) which looks to some degree inline with your initial points (cluster in an idle state for a long period):

"Original problem stemmed from BlueFS inability to replay log, which was caused by BlueFS previously wrote replay log that was corrupted, which was caused by BlueFS log growing to extreme size (~600GB), which was caused by OSD working in a way, when BlueFS::sync_metadata was never invoked."


Igor


On 1/11/2022 2:52 PM, Igor Fedotov wrote:
Frank,

btw - are you aware of https://tracker.ceph.com/issues/45903 ?

I can see it was rejected for mimic for whatever reason. Hence I presume that might be pretty relevant to your case...


Thanks,

Igor

On 1/11/2022 2:45 PM, Frank Schilder wrote:
Hi Igor,

thanks for your reply. To avoid further OSD fails, I shut down the cluster yesterday. Unfortunately, after restart all OSDs trimmed whatever was filling them up:

[root@rit-tceph ~]# ceph osd df tree
ID CLASS WEIGHT  REWEIGHT SIZE    USE     DATA    OMAP META     AVAIL   %USE VAR  PGS TYPE NAME -1       2.44707        - 2.4 TiB 9.2 GiB 255 MiB 25 KiB  9.0 GiB 2.4 TiB 0.37 1.00   - root default -5       0.81569        - 835 GiB 3.1 GiB  85 MiB 19 KiB  3.0 GiB 832 GiB 0.37 1.00   -     host tceph-01   0   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB 19 KiB 1024 MiB 277 GiB 0.37 1.00 169         osd.0   3   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 277 GiB 0.37 1.00 164         osd.3   8   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 277 GiB 0.37 1.00 167         osd.8 -3       0.81569        - 835 GiB 3.1 GiB  85 MiB  3 KiB  3.0 GiB 832 GiB 0.37 1.00   -     host tceph-02   2   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 277 GiB 0.37 1.00 157         osd.2   4   hdd 0.27190  1.00000 278 GiB 1.0 GiB  29 MiB    0 B    1 GiB 277 GiB 0.37 1.00 172         osd.4   6   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB  3 KiB 1024 MiB 277 GiB 0.37 1.00 171         osd.6 -7       0.81569        - 835 GiB 3.1 GiB  85 MiB  3 KiB  3.0 GiB 832 GiB 0.37 1.00   -     host tceph-03   1   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 277 GiB 0.37 1.00 171         osd.1   5   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB  3 KiB 1024 MiB 277 GiB 0.37 1.00 160         osd.5   7   hdd 0.27190  1.00000 278 GiB 1.0 GiB  28 MiB    0 B    1 GiB 277 GiB 0.37 1.00 169         osd.7                      TOTAL 2.4 TiB 9.2 GiB 255 MiB 25 KiB  9.0 GiB 2.4 TiB 0.37
MIN/MAX VAR: 1.00/1.00  STDDEV: 0

The OSDs didn't log what they were doing on startup. The log goes straight from bluefs init to PG scrub messages (with a very long wait time in between). Iostat showed very heavy read on the drives during the trimming/boot phase.

I'm not sure if it helps to collect perf counters already now. I will wait until I see some unusual growth in META again. I don't think the problem is there from the start, it looked more like the OSDs started filling meta up independently at different times. I will let the cluster sit idle as before and keep watching. Hope I find something.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Igor Fedotov <igor.fedotov@xxxxxxxx>
Sent: 11 January 2022 10:27:14
To: Frank Schilder; ceph-users
Subject: Re:  OSD META usage growing without bounds

Hi Frank,

you might want to collect a couple of perf dumps for osd in question in
e.g. one hour interval. And inspect what counters are growing in bluefs
sections. "log_bytes" is of particular interest...


Thanks,

Igor


On 1/10/2022 2:25 PM, Frank Schilder wrote:
Hi, I'm observing a strange behaviour on a small test cluster (13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)). The cluster is up for about half a year and almost empty. We did a few rbd bench runs and created a file system, but there was zero client IO for at least 3 months. It looks like recently the OSD META usage of some OSDs started to increase for no apparent reason. One OSD already died with 100% usage and another is on its way. I can't see any obvious reason for this strange behaviour.

If anyone has an idea, please let me know.

Some diagnostic output:

[root@rit-tceph ~]# ceph status
    cluster:
      id:     bf1f51f5-b381-4cf7-b3db-88d044c1960c
      health: HEALTH_WARN
              1 nearfull osd(s)
              3 pool(s) nearfull

    services:
      mon: 3 daemons, quorum tceph-01,tceph-02,tceph-03
      mgr: tceph-01(active), standbys: tceph-02, tceph-03
      mds: testfs-1/1/1 up  {0=tceph-01=up:active}, 2 up:standby
      osd: 9 osds: 8 up, 8 in

    data:
      pools:   3 pools, 500 pgs
      objects: 24  objects, 2.3 KiB
      usage:   746 GiB used, 1.4 TiB / 2.2 TiB avail
      pgs:     500 active+clean

[root@rit-tceph ~]# ceph df
GLOBAL:
      SIZE        AVAIL       RAW USED     %RAW USED
      2.2 TiB     1.4 TiB      746 GiB         33.49
POOLS:
      NAME                ID     USED        %USED     MAX AVAIL     OBJECTS       test                1         19 B         0        81 GiB           2       testfs_data         2          0 B         0        81 GiB           0       testfs_metadata     3      2.2 KiB         0        81 GiB          22

[root@rit-tceph ~]# ceph osd df tree
ID CLASS WEIGHT  REWEIGHT SIZE    USE     DATA    OMAP META    AVAIL   %USE  VAR  PGS TYPE NAME -1       2.44707        - 2.2 TiB 746 GiB 120 MiB 34 KiB 746 GiB 1.4 TiB 33.49 1.00   - root default -5       0.81569        - 557 GiB 195 GiB  30 MiB  3 KiB 195 GiB 362 GiB 35.04 1.05   -     host tceph-01    0   hdd 0.27190  1.00000 278 GiB  38 GiB  15 MiB  3 KiB  38 GiB 241 GiB 13.61 0.41 260         osd.0    3   hdd 0.27190        0     0 B     0 B     0 B    0 B 0 B     0 B     0    0   0         osd.3    8   hdd 0.27190  1.00000 278 GiB 157 GiB  15 MiB    0 B 157 GiB 121 GiB 56.47 1.69 240         osd.8 -3       0.81569        - 835 GiB 113 GiB  45 MiB  3 KiB 113 GiB 723 GiB 13.48 0.40   -     host tceph-02    2   hdd 0.27190  1.00000 278 GiB  18 GiB  15 MiB    0 B  18 GiB 261 GiB  6.30 0.19 157         osd.2    4   hdd 0.27190  1.00000 278 GiB  48 GiB  15 MiB    0 B  48 GiB 231 GiB 17.21 0.51 172         osd.4    6   hdd 0.27190  1.00000 278 GiB  47 GiB  15 MiB  3 KiB  47 GiB 231 GiB 16.93 0.51 171         osd.6 -7       0.81569        - 835 GiB 438 GiB  45 MiB 28 KiB 438 GiB 397 GiB 52.48 1.57   -     host tceph-03    1   hdd 0.27190  1.00000 278 GiB 238 GiB  15 MiB 25 KiB 238 GiB  41 GiB 85.35 2.55 171         osd.1    5   hdd 0.27190  1.00000 278 GiB 200 GiB  15 MiB  3 KiB 200 GiB  79 GiB 71.68 2.14 160         osd.5    7   hdd 0.27190  1.00000 278 GiB 1.1 GiB  15 MiB    0 B 1.1 GiB 277 GiB  0.40 0.01 169         osd.7                       TOTAL 2.2 TiB 746 GiB 120 MiB 34 KiB 746 GiB 1.4 TiB 33.49
MIN/MAX VAR: 0.01/2.55  STDDEV: 30.50

2 hours later:

[root@rit-tceph ~]# ceph status
    cluster:
      id:     bf1f51f5-b381-4cf7-b3db-88d044c1960c
      health: HEALTH_WARN
              1 nearfull osd(s)
              3 pool(s) nearfull

    services:
      mon: 3 daemons, quorum tceph-01,tceph-02,tceph-03
      mgr: tceph-01(active), standbys: tceph-02, tceph-03
      mds: testfs-1/1/1 up  {0=tceph-01=up:active}, 2 up:standby
      osd: 9 osds: 8 up, 8 in

    data:
      pools:   3 pools, 500 pgs
      objects: 24  objects, 2.3 KiB
      usage:   748 GiB used, 1.4 TiB / 2.2 TiB avail
      pgs:     500 active+clean

The usage is increasing surprisingly fast.

Thanks for any pointers!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux