RadosGW performance degradation on the 18 millions objects stored.

Stas Starikevich <stas.starikevich@xxxxxxxxx> · Tue, 13 Sep 2016 09:17:24 -0400

Hi All,
Asking your assistance with the RadosGW performance degradation on the 18M objects placed (http://pasteboard.co/g781YI3J.png).
Drops from 620 uploads\s to 180-190 uploads\s.

I made list of tests and see that upload performance degrades in 3-4 times when its number of objects reaches 18M.
Number of OSD's doesn't matter, problem reproduces with 6\18\56 OSD's.
Increasing number of index shards doesn't help. Originally I faced with the problem when I had 8 shards per bucket, now it's 256, but same picture.
Number of PG's on the default.rgw.buckets.data also makes no difference, but latest test with 2048 PG's (+nobarrier, +leveldb_compression = false) shows a bit higher upload rate.
Problem reproduces even with erasure coding pool (tested 4-2). Erasure coding gives much higher inodes usage (my first suspicion was in the lack of cache RAM for inodes), but it doesn't matter - drops on the 18M too.

Moved meta\index pools to the SSD only. Increased number of RGW threads to 8192. It raised upload\s from 250 to 600 (and no bad gateway errors), but didn't help with drop at the 18M objects threshold.

Extra tunings (logbsize=256k,delaylog,allocsize=4M,nobarrier, leveldb_cache_size, leveldb_write_buffer_size, osd_pg_epoch_persisted_max_stale, osd_map_cache_size) I made on the few latest tests. Didn't help much, but upload rate became more stable with no drops.

From the HDD's stats I see that on the 18M threshold number of 'read' requests increases from 2-3 to 

Any ideas?

Ceph cluster has 9 nodes:

- ceph-mon0{1..3} - 12G RAM, SSD
- ceph-node0{1..6} - 24G RAM, 9 OSD's with SSD journals

Mon servers have ceph-mons, haproxies (https-ecc) + citweb services.

# ceph -v
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

# ceph -s
    cluster 9cb0840a-bd73-499a-ae09-eaa75a80bddb
     health HEALTH_OK
     monmap e1: 3 mons at {ceph-mon01=10.10.10.21:6789/0,ceph-mon02=10.10.10.22:6789/0,ceph-mon03=10.10.10.23:6789/0}
            election epoch 8, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03
     osdmap e1476: 62 osds: 62 up, 62 in
            flags sortbitwise
      pgmap v68348: 2752 pgs, 12 pools, 2437 GB data, 31208 kobjects
            7713 GB used, 146 TB / 153 TB avail
                2752 active+clean
  client io 1043 kB/s rd, 48307 kB/s wr, 1043 op/s rd, 8153 op/s wr

ceph osd tree: http://pastebin.com/scNuW0LN
ceph df: http://pastebin.com/ZyQByHG4
ceph.conf: http://pastebin.com/9AxVr1gm
ceph osd dump: http://pastebin.com/4mesKGD0

Screenshots from the grafana page:

Number of objects at the degradation moment: http://pasteboard.co/2B5OZ03d0.png
IOPs drop: http://pasteboard.co/2B6vDyKEn.png
Disk util raised to 80%: http://pasteboard.co/2B6YREzoC.png
Disk operations: http://pasteboard.co/2B7uI5PWB.png 
Disk operations - reads: http://pasteboard.co/2B8U8E33d.png

Thanks.

--
Kind regards,
Stas Starikevich, CISSP

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com