Re: Compacting omap data

Brad Hubbard <bhubbard@xxxxxxxxxx> · Fri, 4 Jan 2019 12:30:05 +1000

Nautilus will make this easier.

https://github.com/ceph/ceph/pull/18096

On Thu, Jan 3, 2019 at 5:22 AM Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
>
> Recently on one of our bigger clusters (~1,900 OSDs) running Luminous (12.2.8), we had a problem where OSDs would frequently get restarted while deep-scrubbing.
>
> After digging into it I found that a number of the OSDs had very large omap directories (50GiB+).  I believe these were OSDs that had previous held PGs that were part of the .rgw.buckets.index pool which I have recently moved to all SSDs, however, it seems like the data remained on the HDDs.
>
> I was able to reduce the data usage on most of the OSDs (from ~50 GiB to < 200 MiB!) by compacting the omap dbs offline by setting 'leveldb_compact_on_mount = true' in the [osd] section of ceph.conf, but that didn't work on the newer OSDs which use rocksdb.  On those I had to do an online compaction using a command like:
>
> $ ceph tell osd.510 compact
>
> That worked, but today when I tried doing that on some of the SSD-based OSDs which are backing .rgw.buckets.index I started getting slow requests and the compaction ultimately failed with this error:
>
> $ ceph tell osd.1720 compact
> osd.1720: Error ENXIO: osd down
>
> When I tried it again it succeeded:
>
> $ ceph tell osd.1720 compact
> osd.1720: compacted omap in 420.999 seconds
>
> The data usage on that OSD dropped from 57.8 GiB to 43.4 GiB which was nice, but I don't believe that'll get any smaller until I start splitting the PGs in the .rgw.buckets.index pool to better distribute that pool across the SSD-based OSDs.
>
> The first question I have is what is the option to do an offline compaction of rocksdb so I don't impact our customers while compacting the rest of the SSD-based OSDs?
>
> The next question is if there's a way to configure Ceph to automatically compact the omap dbs in the background in a way that doesn't affect user experience?
>
> Finally, I was able to figure out that the omap directories were getting large because we're using filestore on this cluster, but how could someone determine this when using BlueStore?
>
> Thanks,
> Bryan
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com