Re: How to reduce HDD OSD flapping due to rocksdb compacting event?

Mark Nelson <mnelson@xxxxxxxxxx> · Fri, 12 Apr 2019 08:51:20 -0500

Hi Charles,

Basically the goal is to reduce write-amplification as much as 
possible.  The deeper that the rocksdb hierarchy gets, the worse the 
write-amplifcation for compaction is going to be.  If you look at the 
OSD logs you'll see the write-amp factors for compaction in the rocksdb 
compaction summary sections that periodically pop up. There's a couple 
of things we are trying to see if we can improve things on our end:

1) Adam has been working on experimenting with sharding data across 
multiple column families.  The idea here is that it might be better to 
hav multiple L0 and L1 levels rather than L0, L1, L2 and L3.  I'm not 
sure if this will pan out of not, but that was one of the goals behind 
trying this.

2) Toshiba recently released trocksdb which could have a really big 
impact on compaction write amplification:

Code: https://github.com/ToshibaMemoryAmerica/trocksdb/tree/TRocksRel

Wiki: https://github.com/ToshibaMemoryAmerica/trocksdb/wiki

I recently took a look to see if our key/value size distribution would 
work well with the approach that trocksdb is taking to reduce 
write-amplification:

https://docs.google.com/spreadsheets/d/1fNFI8U-JRkU5uaRJzgg5rNxqhgRJFlDB4TsTAVsuYkk/edit?usp=sharing

The good news is that it sounds like the "Trocks Ratio" for the data we 
put in rocksdb is sufficiently high that we'd see some benefit since it 
should greatly reduce write-amplification during compaction for data 
(but not keys). This doesn't help your immediate problem, but I wanted 
you to know that you aren't the only one and we are thinking about ways 
to reduce the compaction impact.

Mark

On 4/10/19 2:07 AM, Charles Alva wrote:
Hi Ceph Users,

Is there a way around to minimize rocksdb compacting event so that it 
won't use all the spinning disk IO utilization and avoid it being 
marked as down due to fail to send heartbeat to others?

Right now we have frequent high IO disk utilization for every 20-25 
minutes where the rocksdb reaches level 4 with 67GB data to compact.

Kind regards,

Charles Alva
Sent from Gmail Mobile

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com