Re: Any backfill in our cluster makes the cluster unusable and takes forever

Pavan Rallabhandi <PRallabhandi@xxxxxxxxxxxxxxx> · Wed, 19 Sep 2018 18:32:48 +0000

Looks like you are running on CentOS, fwiw. We’ve successfully ran the conversion commands on Jewel, Ubuntu 16.04.

Have a feel it’s expecting the compression to be enabled, can you try removing “compression=kNoCompression” from the filestore_rocksdb_options? And/or you might want to check if rocksdb is expecting snappy to be enabled.

From: David Turner <drakonstein@xxxxxxxxx>
Date: Tuesday, September 18, 2018 at 6:01 PM
To: Pavan Rallabhandi <PRallabhandi@xxxxxxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
Subject: EXT: Re:  Any backfill in our cluster makes the cluster unusable and takes forever

Here's the [1] full log from the time the OSD was started to the end of the crash dump.  These logs are so hard to parse.  Is there anything useful in them?

I did confirm that all perms were set correctly and that the superblock was changed to rocksdb before the first time I attempted to start the OSD with it's new DB.  This is on a fully Luminous cluster with [2] the defaults you mentioned.

[1] https://gist.github.com/drakonstein/fa3ac0ad9b2ec1389c957f95e05b79ed
[2] "filestore_omap_backend": "rocksdb",
"filestore_rocksdb_options": "max_background_compactions=8,compaction_readahead_size=2097152,compression=kNoCompression",

On Tue, Sep 18, 2018 at 5:29 PM Pavan Rallabhandi <mailto:PRallabhandi@xxxxxxxxxxxxxxx> wrote:
I meant the stack trace hints that the superblock still has leveldb in it, have you verified that already?

On 9/18/18, 5:27 PM, "Pavan Rallabhandi" <mailto:PRallabhandi@xxxxxxxxxxxxxxx> wrote:

    You should be able to set them under the global section and that reminds me, since you are on Luminous already, I guess those values are already the default, you can verify from the admin socket of any OSD.

    But the stack trace didn’t hint as if the superblock on the OSD is still considering the omap backend to be leveldb and to do with the compression.

    Thanks,
    -Pavan.

    From: David Turner <mailto:drakonstein@xxxxxxxxx>
    Date: Tuesday, September 18, 2018 at 5:07 PM
    To: Pavan Rallabhandi <mailto:PRallabhandi@xxxxxxxxxxxxxxx>
    Cc: ceph-users <mailto:ceph-users@xxxxxxxxxxxxxx>
    Subject: EXT: Re:  Any backfill in our cluster makes the cluster unusable and takes forever

    Are those settings fine to have be global even if not all OSDs on a node have rocksdb as the backend?  Or will I need to convert all OSDs on a node at the same time?

    On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi <mailto:mailto:PRallabhandi@xxxxxxxxxxxxxxx> wrote:
    The steps that were outlined for conversion are correct, have you tried setting some the relevant ceph conf values too:

    filestore_rocksdb_options = "max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"

    filestore_omap_backend = rocksdb

    Thanks,
    -Pavan.

    From: ceph-users <mailto:mailto:ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of David Turner <mailto:mailto:drakonstein@xxxxxxxxx>
    Date: Tuesday, September 18, 2018 at 4:09 PM
    To: ceph-users <mailto:mailto:ceph-users@xxxxxxxxxxxxxx>
    Subject: EXT:  Any backfill in our cluster makes the cluster unusable and takes forever

    I've finally learned enough about the OSD backend track down this issue to what I believe is the root cause.  LevelDB compaction is the common thread every time we move data around our cluster.  I've ruled out PG subfolder splitting, EC doesn't seem to be the root cause of this, and it is cluster wide as opposed to specific hardware. 

    One of the first things I found after digging into leveldb omap compaction was [1] this article with a heading "RocksDB instead of LevelDB" which mentions that leveldb was replaced with rocksdb as the default db backend for filestore OSDs and was even backported to Jewel because of the performance improvements.

    I figured there must be a way to be able to upgrade an OSD to use rocksdb from leveldb without needing to fully backfill the entire OSD.  There is [2] this article, but you need to have an active service account with RedHat to access it.  I eventually came across [3] this article about optimizing Ceph Object Storage which mentions a resolution to OSDs flapping due to omap compaction to migrate to using rocksdb.  It links to the RedHat article, but also has [4] these steps outlined in it.  I tried to follow the steps, but the OSD I tested this on was unable to start with [5] this segfault.  And then trying to move the OSD back to the original LevelDB omap folder resulted in [6] this in the log.  I apologize that all of my logging is with log level 1.  If needed I can get some higher log levels.

    My Ceph version is 12.2.4.  Does anyone have any suggestions for how I can update my filestore backend from leveldb to rocksdb?  Or if that's the wrong direction and I should be looking elsewhere?  Thank you.

    [1] https://ceph.com/community/new-luminous-rados-improvements/
    [2] https://access.redhat.com/solutions/3210951
    [3] https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize Ceph object storage for production in multisite clouds.pdf

    [4] ■ Stop the OSD
    ■ mv /var/lib/ceph/osd/ceph-/current/omap /var/lib/ceph/osd/ceph-/omap.orig
    ■ ulimit -n 65535
    ■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig store-copy /var/lib/ceph/osd/ceph-/current/omap 10000 rocksdb
    ■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap --command check
    ■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
    ■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
    ■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
    ■ Start the OSD

    [5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption: Snappy not supported or corrupted Snappy compressed block contents
    2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal (Aborted) **

    [6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init: unable to mount object store
    2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd init failed: (1) Operation not permittedESC[0m
    2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167 (ceph:ceph)
    2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process (unknown), pid 361535
    2018-09-17 19:27:54.231275 7f7f03308d80  0 pidfile_write: ignore empty --pid-file
    2018-09-17 19:27:54.260207 7f7f03308d80  0 load: jerasure load: lrc load: isa
    2018-09-17 19:27:54.260520 7f7f03308d80  0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
    2018-09-17 19:27:54.261135 7f7f03308d80  0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
    2018-09-17 19:27:54.261750 7f7f03308d80  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
    2018-09-17 19:27:54.261757 7f7f03308d80  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
    2018-09-17 19:27:54.261758 7f7f03308d80  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() is disabled via 'filestore splice' config option
    2018-09-17 19:27:54.286454 7f7f03308d80  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
    2018-09-17 19:27:54.286572 7f7f03308d80  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf
    2018-09-17 19:27:54.287119 7f7f03308d80  0 filestore(/var/lib/ceph/osd/ceph-0) start omap initiation
    2018-09-17 19:27:54.287527 7f7f03308d80 -1 filestore(/var/lib/ceph/osd/ceph-0) mount(1723): Error initializing leveldb : Corruption: VersionEdit: unknown tag

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com