Garbage collection growing and db_compaction with small file uploads

Chris Sarginson <csargiso@xxxxxxxxx> · Wed, 9 Jan 2019 19:15:58 +0000

Hi all

I'm seeing some behaviour I wish to check on a Luminous (12.2.10) cluster that I'm running for rbd and rgw (mostly SATA filestore with NVME journal with a few SATA only bluestore).  There's a set of dedicated SSD OSDs running bluestore for the .rgw buckets.index pool and also holding the .rgw.gc pool

There's a long running upload of small files, which I think is causing a large amount of leveldb compaction (on filestore nodes) and rocksdb compaction on bluestore nodes.  The .rgw.buckets bluestore nodes were exhibiting noticeably higher load than filestore nodes, although this seems to have been solved following configuring the following options for bluestore SATA osds:

bluestore cache size hdd = 10737418240
osd memory target = 10737418240

However the bluestore nodes are still showing significantly higher wait CPU and higher disk IO than filestore nodes, is there anything else that I should be looking at tuning for bluestore, or is this is expected due to the loss of file cache with filestore?

Whilst the upload has been running a "radosgw-admin orphans find" was also being executed, although this was ended manually before completion, as a significant buildup in garbage collection has occurred.  Looking into this, it looks like most of the outstanding garbage collection relates to a single bucket, which was shown to contain a large amount of multipart/shadow files.  These are now being listed in the radosgw-admin gc list 

# radosgw-admin gc list | grep -c '"oid":'
224557347
# radosgw-admin gc list | grep  '"oid":' | grep -v -c "default.1084171934.99"
3674322
# radosgw-admin gc list | head -1000 | grep  '"oid":'| grep 1084171934
                "oid": "default.1084171934.99__multipart_ServerImageBackup/95C48F007C44E36C-00-00.mrimg.tmp.2~MZ7fyct8yAWCUX82e9F-j9q-UJcnheP.1",
                "oid": "default.1084171934.99__shadow_ServerImageBackup/95C48F007C44E36C-00-00.mrimg.tmp.2~MZ7fyct8yAWCUX82e9F-j9q-UJcnheP.1_1",
                "oid": "default.1084171934.99__shadow_ServerImageBackup/95C48F007C44E36C-00-00.mrimg.tmp.2~MZ7fyct8yAWCUX82e9F-j9q-UJcnheP.1_2",
                "oid": "default.1084171934.99__shadow_ServerImageBackup/95C48F007C44E36C-00-00.mrimg.tmp.2~MZ7fyct8yAWCUX82e9F-j9q-UJcnheP.1_3",
                "oid": "default.1084171934.99__shadow_ServerImageBackup/95C48F007C44E36C-00-00.mrimg.tmp.2~MZ7fyct8yAWCUX82e9F-j9q-UJcnheP.1_4",
                "oid": "default.1084171934.99__multipart_ServerImageBackup/95C48F007C44E36C-00-00.mrimg.tmp.2~MZ7fyct8yAWCUX82e9F-j9q-UJcnheP.2",

Despite running multiple "radosgw-admin gc process" commands alongside our radosgw processes, which has helped clean up garbage collection in the past, our gc list is currently continuing to grow.  I believe I can loop through this manually and use the rados rm command to remove the objects from the .rgw.buckets pool after having a look through some historic posts on this list, and then remove the garbage collection objects - is this a reasonable solution?  Are there any recommendations for dealing with a garbage collection list of this size?

If there's any additional information I should provide for context here, please let me know.

Thanks for any help
Chris

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com