Re: dealing with spillovers

Reed Dier <reed.dier@xxxxxxxxxxx> · Sat, 6 Jun 2020 00:00:35 -0500

The WAL/DB was part of the OSD deployment.
OSD is running 14.2.9.

Would grabbing the ceph-kvstore-tool bluestore-kv <path-to-osd> stats as in that ticket be of any usefulness to this?

Thanks,

Reed

On Jun 5, 2020, at 5:27 PM, Igor Fedotov <ifedotov@xxxxxxx> wrote:

  This might help -see comment #4 at https://tracker.ceph.com/issues/44509

And just for the sake of information collection - what Ceph
      version is used in this cluster?
Did you setup DB volume along with OSD deployment or they were
      added later as  was done in the ticket above?

Thanks,
Igor

    On 6/6/2020 1:07 AM, Reed Dier wrote:

      I'm going to piggy back on this somewhat.

      I've battled RocksDB spillovers over the course of
        the life of the cluster since moving to bluestore, however I
        have always been able to compact it well enough.

      But now I am stumped at getting this to compact via
        $ceph tell osd.$osd compact, which has always worked in the
        past.

      No matter how many times I compact it, I always
        spill over exactly 192KiB.

          BLUEFS_SPILLOVER BlueFS spillover detected on 1
            OSD(s)
               osd.36 spilled over 192 KiB metadata from
            'db' device (26 GiB used of 34 GiB) to slow device
               osd.36 spilled over 192 KiB metadata from
            'db' device (16 GiB used of 34 GiB) to slow device
               osd.36 spilled over 192 KiB metadata from
            'db' device (22 GiB used of 34 GiB) to slow device
               osd.36 spilled over 192 KiB metadata from
            'db' device (13 GiB used of 34 GiB) to slow device

        The multiple entries are from different time trying to compact
        it.

      The OSD is a 1.92TB SATA SSD, the WAL/DB is a 36GB
        partition on NVMe.
      I tailed and tee'd the OSD's logs during a manual
        compaction here: https://pastebin.com/bcpcRGEe
      This is with the normal logging level.
      I have no idea how to make heads or tails of that
        log data, but maybe someone can figure out why this one OSD just
        refuses to compact?

      OSD is 14.2.9.
      OS is U18.04.
      Kernel is 4.15.0-96.

      I haven't played with ceph-bluestore-tool or
        ceph-kvstore-tool but after seeing the above mention in this
        thread, I do see ceph-kvstore-tool <rocksdb|bluestore-kv?>
        compact, which sounds like it may be the same thing that ceph
        tell compact does under the hood?

          compact
          Subcommand compact is used to compact all data
            of kvstore. It will open the database, and trigger a
            database's compaction. After compaction, some disk space may
            be released.

        Also, not sure if this is helpful:

            osd.36
                spilled over 192 KiB metadata from 'db' device (13 GiB
                used of 34 GiB) to slow device

          ID   CLASS WEIGHT  
               REWEIGHT SIZE    RAW USE  DATA    OMAP    META    AVAIL  
              %USE  VAR  PGS STATUS TYPE NAME
            36   ssd   1.77879
               1.00000 1.8 TiB  1.2 TiB 1.2 TiB 6.2 GiB 7.2 GiB 603 GiB
              66.88 0.94  85     up             osd.36

        You can see the breakdown between OMAP data and
          META data.

        After compacting again:
        osd.36
            spilled over 192 KiB metadata from 'db' device (26 GiB used
            of 34 GiB) to slow device

            ID   CLASS WEIGHT
                   REWEIGHT SIZE    RAW USE  DATA    OMAP    META  
                 AVAIL   %USE  VAR  PGS STATUS TYPE NAME
              36   ssd  
                1.77879  1.00000 1.8 TiB  1.2 TiB 1.2 TiB 6.2 GiB  20
                GiB 603 GiB 66.88 0.94  85     up             osd.36

        So the OMAP size remained the same, while the
          metadata ballooned (while still conspicuously spilling over
          192KiB exactly)
        These OSDs have a few RBD images, cephfs metadata,
          and librados objects (not RGW) stored.

        The breakdown of OMAP size is pretty widely
          binned, but the GiB sizes are definitely the minority.
        Looking at the breakdown with some simple bash-fu
        KiB = 147
        MiB = 105
        GiB = 24

        To further divide that, all of the GiB sized OMAPs
          are SSD OSD's:

                  SSD

                  HDD

                  TOTAL

                  KiB

                  0

                  147

                  147

                  MiB

                  36

                  69

                  105

                  GiB

                  24

                  0

                  24

        I have no idea if any of these data points are pertinent or
          helpful, but I want to give as clear a picture as possible to
          prevent chasing the wrong thread.
        Appreciate any help with this.

        Thanks,
        Reed

            On May 26, 2020, at 9:48 AM, thoralf schulze
              <t.schulze@xxxxxxxxxxxx>
              wrote:

              hi there,

                trying to get around my head rocksdb spillovers and how
                to deal with

                them … in particular, i have one osds which does not
                have any pools

                associated (as per ceph pg ls-by-osd $osd ), yet it does
                show up in ceph

                health detail as:

                    osd.$osd spilled over 2.9 MiB metadata from 'db'
                device (49 MiB

                used of 37 GiB) to slow device

                compaction doesn't help. i am well aware of

                https://tracker.ceph.com/issues/38745
                , yet find it really

                counter-intuitive that an empty osd with a more-or-less
                optimal sized db

                volume can't fit its rockdb on the former.

                is there any way to repair this, apart from re-creating
                the osd? fwiw,

                dumping the database with

                ceph-kvstore-tool bluestore-kv
                /var/lib/ceph/osd/ceph-$osd dump >

                bluestore_kv.dump

                yields a file of less than 100mb in size.

                and, while we're at it, a few more related questions:

                - am i right to assume that the leveldb and rocksdb
                arguments to

                ceph-kvstore-tool are only relevant for osds with
                filestore-backend?

                - does ceph-kvstore-tool bluestore-kv … also deal with
                rocksdb-items for

                osds with bluestore-backend?

                thank you very much & with kind regards,

                thoralf.

                _______________________________________________

                ceph-users mailing list -- ceph-users@xxxxxxx

                To unsubscribe send an email to ceph-users-leave@xxxxxxx

      _______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx