Compacting omap data

Bryan Stillwell <bstillwell@xxxxxxxxxxx> · Wed, 2 Jan 2019 19:22:19 +0000

Recently on one of our bigger clusters (~1,900 OSDs) running Luminous (12.2.8), we had a problem where OSDs would frequently get restarted while deep-scrubbing.

After digging into it I found that a number of the OSDs had very large omap directories (50GiB+).  I believe these were OSDs that had previous held PGs that were part of the .rgw.buckets.index pool which I have recently moved to all SSDs, however, it seems like the data remained on the HDDs.

I was able to reduce the data usage on most of the OSDs (from ~50 GiB to < 200 MiB!) by compacting the omap dbs offline by setting 'leveldb_compact_on_mount = true' in the [osd] section of ceph.conf, but that didn't work on the newer OSDs which use rocksdb.  On those I had to do an online compaction using a command like:

$ ceph tell osd.510 compact

That worked, but today when I tried doing that on some of the SSD-based OSDs which are backing .rgw.buckets.index I started getting slow requests and the compaction ultimately failed with this error:

$ ceph tell osd.1720 compact
osd.1720: Error ENXIO: osd down

When I tried it again it succeeded:

$ ceph tell osd.1720 compact
osd.1720: compacted omap in 420.999 seconds

The data usage on that OSD dropped from 57.8 GiB to 43.4 GiB which was nice, but I don't believe that'll get any smaller until I start splitting the PGs in the .rgw.buckets.index pool to better distribute that pool across the SSD-based OSDs.

The first question I have is what is the option to do an offline compaction of rocksdb so I don't impact our customers while compacting the rest of the SSD-based OSDs?

The next question is if there's a way to configure Ceph to automatically compact the omap dbs in the background in a way that doesn't affect user experience?

Finally, I was able to figure out that the omap directories were getting large because we're using filestore on this cluster, but how could someone determine this when using BlueStore?

Thanks,
Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com