Re: How to reduce HDD OSD flapping due to rocksdb compacting event?

Mark Nelson <mnelson@xxxxxxxxxx> · Fri, 12 Apr 2019 10:52:59 -0500

They have the same issue, but depending on the SSD may be better at 
absorbing the extra IO if network or CPU are bigger bottlenecks.  That's 
one of the reasons that a lot of folks like to put the DB on flash for 
HDD based clusters.  It's still possible to oversubscribe them, but 
you've got more headroom.

Mark

On 4/12/19 10:25 AM, Charles Alva wrote:
Thanks Mark,

This is interesting. I'll take a look at the links you provided.

Does rocksdb compacting issue only affect HDDs? Or SSDs are having 
same issue?

Kind regards,

Charles Alva
Sent from Gmail Mobile

On Fri, Apr 12, 2019, 9:01 PM Mark Nelson <mnelson@xxxxxxxxxx 
<mailto:mnelson@xxxxxxxxxx>> wrote:

    Hi Charles,

    Basically the goal is to reduce write-amplification as much as
    possible.  The deeper that the rocksdb hierarchy gets, the worse the
    write-amplifcation for compaction is going to be.  If you look at the
    OSD logs you'll see the write-amp factors for compaction in the
    rocksdb
    compaction summary sections that periodically pop up. There's a
    couple
    of things we are trying to see if we can improve things on our end:

    1) Adam has been working on experimenting with sharding data across
    multiple column families.  The idea here is that it might be
    better to
    hav multiple L0 and L1 levels rather than L0, L1, L2 and L3. I'm not
    sure if this will pan out of not, but that was one of the goals
    behind
    trying this.

    2) Toshiba recently released trocksdb which could have a really big
    impact on compaction write amplification:

    Code: https://github.com/ToshibaMemoryAmerica/trocksdb/tree/TRocksRel

    Wiki: https://github.com/ToshibaMemoryAmerica/trocksdb/wiki

    I recently took a look to see if our key/value size distribution
    would
    work well with the approach that trocksdb is taking to reduce
    write-amplification:

    https://docs.google.com/spreadsheets/d/1fNFI8U-JRkU5uaRJzgg5rNxqhgRJFlDB4TsTAVsuYkk/edit?usp=sharing

    The good news is that it sounds like the "Trocks Ratio" for the
    data we
    put in rocksdb is sufficiently high that we'd see some benefit
    since it
    should greatly reduce write-amplification during compaction for data
    (but not keys). This doesn't help your immediate problem, but I
    wanted
    you to know that you aren't the only one and we are thinking about
    ways
    to reduce the compaction impact.

    Mark

    On 4/10/19 2:07 AM, Charles Alva wrote:
    > Hi Ceph Users,
    >
    > Is there a way around to minimize rocksdb compacting event so
    that it
    > won't use all the spinning disk IO utilization and avoid it being
    > marked as down due to fail to send heartbeat to others?
    >
    > Right now we have frequent high IO disk utilization for every 20-25
    > minutes where the rocksdb reaches level 4 with 67GB data to compact.
    >
    >
    > Kind regards,
    >
    > Charles Alva
    > Sent from Gmail Mobile
    >
    > _______________________________________________
    > ceph-users mailing list
    > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com