Re: Bluefs spillover

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Igor,

Thanks for this. Indeed we are removing data in the form of OSDs and
recreating them as we recently converted this cluster to use containers.
I'm not aware of any bulk removals of data inside rbd. Do you happen to
know how that command would be used with containers? cephadm is only
present on the mon/mgr nodes.

On Mon, Aug 26, 2024 at 2:51 PM Igor Fedotov <igor.fedotov@xxxxxxxx> wrote:

> Ruben,
>
> given the recorded maximums for low DB levels (roughly 47 GB) it look
> like OSD DBs took much more space in the past. Perhaps you made some
> bulky data removals recently. Well - compaction could cause such a drop
> as well.
>
> Anyway this prevents BlueStore from using extra DB space for high (aka
> SLOW) levels - they land on main(slow) device.
>
> So you have two options:
>
> 1) Expand DB volume to match your potential metadata sizes.
>
> 2) If you expect no metadata size growth - migrate data from slow device
> as described under the link I shared. As migration is an offline process
> this would restart OSDs and hence resets recorded maximums. After that
> Bluestore likely to be able to use extra DB space for new data (up to
> some degree, of cause).
>
>
> Thanks,
>
> Igor
>
>
> On 8/26/2024 12:37 PM, Ruben Bosch wrote:
> > Hi Igor,
> >
> > Thank you for your fast reply. I'll look into the provided URL, thanks.
> > Please see below:
> >
> > for i in osd.17 osd.37 osd.89 osd.91 osd.95 osd.106; do ceph tell $i
> bluefs
> > stats; done
> > 1 : device size 0xc7fffe000 : using 0x423c00000(17 GiB)
> > 2 : device size 0x9187fc00000 : using 0x5bb0bab0000(5.7 TiB)
> > RocksDBBlueFSVolumeSelector Usage Matrix:
> > DEV/LEV     WAL         DB          SLOW        *           *
> > REAL        FILES
> > LOG         0 B         18 MiB      0 B         0 B         0 B
>  15
> > MiB      1
> > WAL         0 B         36 MiB      0 B         0 B         0 B
>  28
> > MiB      2
> > DB          0 B         17 GiB      0 B         0 B         0 B
>  13
> > GiB      211
> > SLOW        0 B         0 B         70 MiB      0 B         0 B
>  62
> > MiB      1
> > TOTAL       0 B         17 GiB      70 MiB      0 B         0 B
>  0 B
> >          215
> > MAXIMUMS:
> > LOG         0 B         22 MiB      0 B         0 B         0 B
>  18
> > MiB
> > WAL         0 B         126 MiB     0 B         0 B         0 B
>  92
> > MiB
> > DB          0 B         47 GiB      986 MiB     0 B         0 B
>  17
> > GiB
> > SLOW        0 B         3.0 GiB     352 MiB     0 B         0 B
>  2.4
> > GiB
> > TOTAL       0 B         50 GiB      1.3 GiB     0 B         0 B
>  0 B
> >>> SIZE <<  0 B         48 GiB      8.6 TiB
> > 1 : device size 0xc7fffe000 : using 0x434200000(17 GiB)
> > 2 : device size 0x9187fc00000 : using 0x5fd47880000(6.0 TiB)
> > RocksDBBlueFSVolumeSelector Usage Matrix:
> > DEV/LEV     WAL         DB          SLOW        *           *
> > REAL        FILES
> > LOG         0 B         14 MiB      0 B         0 B         0 B
>  9.4
> > MiB     1
> > WAL         0 B         18 MiB      0 B         0 B         0 B
>  11
> > MiB      1
> > DB          0 B         17 GiB      0 B         0 B         0 B
>  13
> > GiB      216
> > SLOW        0 B         0 B         70 MiB      0 B         0 B
>  53
> > MiB      1
> > TOTAL       0 B         17 GiB      70 MiB      0 B         0 B
>  0 B
> >          219
> > MAXIMUMS:
> > LOG         0 B         22 MiB      0 B         0 B         0 B
>  18
> > MiB
> > WAL         0 B         126 MiB     0 B         0 B         0 B
>  93
> > MiB
> > DB          0 B         48 GiB      0 B         0 B         0 B
>  16
> > GiB
> > SLOW        0 B         1.9 GiB     141 MiB     0 B         0 B
>  1.5
> > GiB
> > TOTAL       0 B         49 GiB      141 MiB     0 B         0 B
>  0 B
> >>> SIZE <<  0 B         48 GiB      8.6 TiB
> > 1 : device size 0xc7fffe000 : using 0x458500000(17 GiB)
> > 2 : device size 0x9187fc00000 : using 0x6b27b860000(6.7 TiB)
> > RocksDBBlueFSVolumeSelector Usage Matrix:
> > DEV/LEV     WAL         DB          SLOW        *           *
> > REAL        FILES
> > LOG         0 B         14 MiB      0 B         0 B         0 B
>  12
> > MiB      1
> > WAL         0 B         72 MiB      0 B         0 B         0 B
>  51
> > MiB      4
> > DB          0 B         17 GiB      0 B         0 B         0 B
>  14
> > GiB      231
> > SLOW        0 B         0 B         70 MiB      0 B         0 B
>  60
> > MiB      1
> > TOTAL       0 B         17 GiB      70 MiB      0 B         0 B
>  0 B
> >          237
> > MAXIMUMS:
> > LOG         0 B         22 MiB      0 B         0 B         0 B
>  18
> > MiB
> > WAL         0 B         198 MiB     0 B         0 B         0 B
>  172
> > MiB
> > DB          0 B         48 GiB      0 B         0 B         0 B
>  20
> > GiB
> > SLOW        0 B         1.7 GiB     423 MiB     0 B         0 B
>  1.1
> > GiB
> > TOTAL       0 B         49 GiB      423 MiB     0 B         0 B
>  0 B
> >>> SIZE <<  0 B         48 GiB      8.6 TiB
> > 1 : device size 0xc7fffe000 : using 0x476800000(18 GiB)
> > 2 : device size 0x9187fc00000 : using 0x6a6d46f0000(6.7 TiB)
> > RocksDBBlueFSVolumeSelector Usage Matrix:
> > DEV/LEV     WAL         DB          SLOW        *           *
> > REAL        FILES
> > LOG         0 B         14 MiB      0 B         0 B         0 B
>  9.6
> > MiB     1
> > WAL         0 B         54 MiB      0 B         0 B         0 B
>  34
> > MiB      3
> > DB          0 B         18 GiB      0 B         0 B         0 B
>  14
> > GiB      238
> > SLOW        0 B         71 MiB      141 MiB     0 B         0 B
>  5.9
> > MiB     3
> > TOTAL       0 B         18 GiB      141 MiB     0 B         0 B
>  0 B
> >          245
> > MAXIMUMS:
> > LOG         0 B         22 MiB      0 B         0 B         0 B
>  18
> > MiB
> > WAL         0 B         180 MiB     0 B         0 B         0 B
>  155
> > MiB
> > DB          0 B         41 GiB      0 B         0 B         0 B
>  19
> > GiB
> > SLOW        0 B         6.5 GiB     2.5 GiB     0 B         0 B
>  6.8
> > GiB
> > TOTAL       0 B         43 GiB      2.5 GiB     0 B         0 B
>  0 B
> >>> SIZE <<  0 B         48 GiB      8.6 TiB
> > 1 : device size 0xc7fffe000 : using 0x405100000(16 GiB)
> > 2 : device size 0x9187fc00000 : using 0x5bb806e0000(5.7 TiB)
> > RocksDBBlueFSVolumeSelector Usage Matrix:
> > DEV/LEV     WAL         DB          SLOW        *           *
> > REAL        FILES
> > LOG         0 B         18 MiB      0 B         0 B         0 B
>  13
> > MiB      1
> > WAL         0 B         36 MiB      0 B         0 B         0 B
>  15
> > MiB      2
> > DB          0 B         16 GiB      0 B         0 B         0 B
>  13
> > GiB      210
> > SLOW        0 B         0 B         141 MiB     0 B         0 B
>  71
> > MiB      2
> > TOTAL       0 B         16 GiB      141 MiB     0 B         0 B
>  0 B
> >          215
> > MAXIMUMS:
> > LOG         0 B         22 MiB      0 B         0 B         0 B
>  18
> > MiB
> > WAL         0 B         126 MiB     0 B         0 B         0 B
>  93
> > MiB
> > DB          0 B         48 GiB      0 B         0 B         0 B
>  16
> > GiB
> > SLOW        0 B         2.0 GiB     141 MiB     0 B         0 B
>  1.8
> > GiB
> > TOTAL       0 B         50 GiB      141 MiB     0 B         0 B
>  0 B
> >>> SIZE <<  0 B         48 GiB      8.6 TiB
> > 1 : device size 0xc7fffe000 : using 0x3cdd00000(15 GiB)
> > 2 : device size 0x9187fc00000 : using 0x5bb9b2e0000(5.7 TiB)
> > RocksDBBlueFSVolumeSelector Usage Matrix:
> > DEV/LEV     WAL         DB          SLOW        *           *
> > REAL        FILES
> > LOG         0 B         6 MiB       0 B         0 B         0 B
>  3.4
> > MiB     1
> > WAL         0 B         108 MiB     0 B         0 B         0 B
>  78
> > MiB      6
> > DB          0 B         15 GiB      0 B         0 B         0 B
>  12
> > GiB      202
> > SLOW        0 B         142 MiB     70 MiB      0 B         0 B
>  34
> > MiB      3
> > TOTAL       0 B         15 GiB      70 MiB      0 B         0 B
>  0 B
> >          212
> > MAXIMUMS:
> > LOG         0 B         22 MiB      0 B         0 B         0 B
>  18
> > MiB
> > WAL         0 B         126 MiB     0 B         0 B         0 B
>  93
> > MiB
> > DB          0 B         49 GiB      563 MiB     0 B         0 B
>  16
> > GiB
> > SLOW        0 B         1014 MiB    323 MiB     0 B         0 B
>  895
> > MiB
> > TOTAL       0 B         50 GiB      886 MiB     0 B         0 B
>  0 B
> >>> SIZE <<  0 B         48 GiB      8.6 TiB
> > On Mon, Aug 26, 2024 at 11:01 AM Igor Fedotov<igor.fedotov@xxxxxxxx>
> wrote:
> >
> >> Hi Ruben,
> >>
> >> it could be nice if you share 'ceph tell osd.N bluefs stats' command
> >> output for these OSDs.
> >>
> >> Also you might want to read  the following thread
> >> https://www.spinics.net/lists/ceph-users/msg79062.html
> >>
> >> which describes using 'ceph-volume lvm migrate' (or its counterpart in
> >> ceph-bluestore-tool) to migrate BlueFS data from slow to DB volume.
> >>
> >> The latter might have temporary or permanent impact depending  on the
> >> spillover root cause though.
> >>
> >>
> >> Thanks,
> >>
> >> Igor
> >>
> >> On 8/26/2024 10:08 AM, Ruben Bosch wrote:
> >>> Hi all,
> >>>
> >>> ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef
> >> (stable)
> >>> We are working on marking out OSDs on a host with EC4+2. The OSDs are
> >> HDDs.
> >>> The OSDs have a separate DB on an NVMe disk. All the operations take
> >> ages.
> >>> After some time we see BLUEFS_SPILLOVER. Telling the mentioned OSDs to
> >>> compact sometimes helps, but not always. The OSDs have plenty space
> >>> remaining in the db but the spillover does not disappear.
> >>>
> >>> [WRN] BLUEFS_SPILLOVER: 2 OSD(s) experiencing BlueFS spillover
> >>>        osd.91 spilled over 141 MiB metadata from 'db' device (15 GiB
> used
> >> of
> >>> 50 GiB) to slow device
> >>>        osd.106 spilled over 70 MiB metadata from 'db' device (12 GiB
> used
> >> of
> >>> 50 GiB) to slow device
> >>>
> >>> Has anyone seen similar behavior before and have they found a
> workaround
> >> or
> >>> solution?
> >>>
> >>> Kind regards,
> >>>
> >>> Ruben Bosch
> >>> _______________________________________________
> >>> ceph-users mailing list --ceph-users@xxxxxxx
> >>> To unsubscribe send an email toceph-users-leave@xxxxxxx
> >> --
> >> Igor Fedotov
> >> Ceph Lead Developer
> >>
> >> Looking for help with your Ceph cluster? Contact us athttps://croit.io
> >>
> >> croit GmbH, Freseniusstr. 31h, 81247 Munich
> >> CEO: Martin Verges - VAT-ID: DE310638492
> >> Com. register: Amtsgericht Munich HRB 231263
> >> Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
> >>
> >>
> > _______________________________________________
> > ceph-users mailing list --ceph-users@xxxxxxx
> > To unsubscribe send an email toceph-users-leave@xxxxxxx
>
> --
> Igor Fedotov
> Ceph Lead Developer
>
> Looking for help with your Ceph cluster? Contact us athttps://croit.io
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux