Re: Large RocksDB (db_slow_bytes) on OSD which is marked as out

Igor Fedotov <ifedotov@xxxxxxx> · Mon, 31 Aug 2020 17:27:57 +0300

Could you please run:  ceph daemon <osd-id> calc_objectstore_db_histogram

and share the output?

On 8/31/2020 4:33 PM, Wido den Hollander wrote:

On 31/08/2020 12:31, Igor Fedotov wrote:
Hi Wido,

'b' prefix relates to free list manager which keeps all the free 
extents for main device in a bitmap. Its records have fixed size 
hence you can easily estimate the overall size for these type of data.

Yes, so I figured.

But I doubt it takes that much. I presume that DB just lacks the 
proper compaction. Which could happen eventually but looks like you 
interrupted the process by going offline.

May be try manual compaction with ceph-kvstore-tool?

This cluster is suffering from a lot of spillovers. So we tested with 
marking one OSD as out.

After being marked as out it still had this large DB. A compact didn't 
work, the RocksDB database just stayed so large.

New OSDs coming into the cluster aren't suffering from this and they 
have a RocksDB of a couple of MB in size.

Old OSDs installed with Luminous and now upgraded to Nautilus are 
suffering from this.

It kind of seems like that garbage data stays behind in RocksDB which 
is never clean up.

Wido

Thanks,

Igor

On 8/31/2020 10:57 AM, Wido den Hollander wrote:
Hello,

On a Nautilus 14.2.8 cluster I am seeing large RocksDB database with 
many slow DB bytes in use.

To investigate this further I marked one OSD as out and waited for 
the all the backfilling to complete.

Once the backfilling was completed I exported BlueFS and 
investigated the RocksDB using 'ceph-kvstore-tool'. This resulted in 
22GB of data.

Listing all the keys in the RocksDB shows me there are 747.000 keys 
in the DB. A small portion are osdmaps, but the biggest amount are 
keys prefixed with 'b'.

I dumped the stats of the RocksDB and this shows me:

L1: 1/0: 439.32 KB
L2: 1/0: 2.65 MB
L3: 5/0: 14.36 MB
L4: 127/0: 7.22 GB
L5: 217/0: 13.73 GB
Sum: 351/0: 20.98 GB

So there is almost 21GB of data in this RocksDB database. Why? Where 
is this coming from?

Throughout this cluster OSDs are suffering from many slow bytes used 
and I can't figure out why.

Has anybody seen this or has a clue on what is going on?

I have an external copy of this RocksDB database to do 
investigations on.

Thank you,

Wido
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx